perm filename V243.TEX[TEX,DEK] blob
sn#524710 filedate 1980-07-26 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00021 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00004 00002 \input acphdr % Section 4.3
C00005 00003 % Page 250
C00022 00004 % Vicinity of page 253
C00037 00005 % Vicinity of page 256
C00053 00006 % Vicinity of page 260
C00068 00007 % Vicinity of page 263
C00083 00008 % Vicinity of page 266
C00098 00009 % Vicinity of page 268
C00102 00010 % Vicinity of page 269
C00115 00011 % Vicinity of page 271
C00131 00012 % Vicinity of page 274
C00147 00013 % Vicinity of page 278
C00160 00014 % Vicinity of page 280
C00174 00015 % Vicinity of page 283
C00187 00016 % Vicinity of page 286
C00200 00017 % Vicinity of page 289
C00213 00018 % Vicinity of page 292
C00229 00019 % Vicinity of page 295
C00242 00020 % Vicinity of page 297
C00258 00021 \vfill\end
C00259 ENDMK
C⊗;
\input acphdr % Section 4.3
\open0=v243.inx
\chpar19←1 % this will prepare a ".JST" file containing justification statistics
\runninglefthead{ARITHMETIC} % chapter title
\titlepage\setcount00
\null
\vfill
\tenpoint
\ctrline{SECTION 4.3 of THE ART OF COMPUTER PROGRAMMING}
\ctrline{$\copyright$ 1980 Addison--Wesley Publishing Company, Inc.}
\vfill
\runningrighthead{MULTIPLE-PRECISION ARITHMETIC}
\section{4.3}
\eject
\setcount0 250
% Page 250
% folio 344 galley 1 *
\sectionbegin{4.3. MULTIPLE-PRECISION ARITHMETIC}
L{\:mET} {\:mUS} {\:mNOW} consider operations on numbers that have arbitrarily high
\inxf{Multiple-precision arithmetic}
pre\-cision. For simplicity in exposition, we shall assume that we are working
with integers, instead of with numbers that have an embedded radix point.
\runningrighthead{THE CLASSICAL ALGORITHMS}
\section{4.3.1}
\sectionskip
\sectionbegin{4.3.1. The Classical Algorithms}
In this section we shall discuss algorithms for
\yyskip\hang\textindent{a)}\β{addition} or \α{subtraction} of $n$-place integers,
giving an $n$-place answer and a carry;
\yskip\hang\textindent{b)}\β{multiplication} of an $n$-place integer by an $m$-place
integer, giving an $(m+n)$-place answer;
\yskip\hang\textindent{c)}\β{division} of an $(m+n)$-place integer by an $n$-place
integer, giving an $(m+1)$-place \α{quotient} and an $n$-place \α{remainder}.
\yyskip\noindent These may be called ``the classical algorithms,'' since the
word ``algorithm'' was used only in connection with these processes for
several centuries. The term ``$n$-place integer'' means any integer less than
$b↑n$, where $b$ is the radix of ordinary positional notation in which
the numbers are expressed; such numbers can be written using at most $n$
``\α{places}'' in this notation.
It is a straightforward
matter to apply the classical algorithms for integers to
numbers with embedded radix points
or to extended-precision floating point numbers, in the same way
that arithmetic operations defined for integers in \MIX\ are
applied to these more general problems.
In this section we shall study algorithms that do operations
(a), (b), and (c) above for integers expressed in radix $b$
notation, where $b$ is any given integer $≥ 2$. Thus the algorithms
are quite general definitions of arithmetic processes, and as
such they are unrelated to any particular computer. But the
discussion in this section will also be somewhat machine-oriented,
since we are chiefly concerned with efficient methods for doing
high-precision calculations by computer. Although our examples
are based on the mythical \MIX, essentially the same
considerations apply to nearly every other machine. For convenience,
we shall assume first that we have a computer (like \MIX) that
uses the signed-magnitude representation for numbers; suitable
modifications for complement notations are discussed near the
end of this section.
The most important fact to understand about extended-precision
numbers is that they may be regarded as numbers written in radix
$w$ notation, where $w$ is the computer's word size. For example,
an integer that fills 10 words on a computer whose word size
is $w = 10↑{10}$ has 100 decimal digits; but we will consider
it to be a 10-place number to the base $10↑{10}$. This viewpoint
is justified for the same reason that we may convert, say, from
\inx{conversion of representations}
binary to octal notation, simply by grouping the bits together.\xskip
(See Eq.\ 4.1--5.)
In these terms, we are given the following primitive operations
to work with:
\yskip\hang\textindent{a$↓0$)}addition or subtraction of one-place integers,
giving a one-place answer and a carry;
\penalty-200 % break here? (even page assumed)(April 1980)
\yskip\hang\textindent{b$↓0$)}multiplication of a one-place integer by another
one-place integer, giving a two-place answer;
\yskip\hang\textindent{c$↓0$)}division of a two-place integer by a one-place
integer, provided that the quotient is a one-place integer,
and yielding also a one-place remainder.
\yskip\noindent By adjusting the word size, if
necessary, nearly all computers will have these three operations
available; so we will construct algorithms (a), (b),
and (c) mentioned above in terms of the primitive operations
(a$↓0$), (b$↓0$), and (c$↓0$).
Since we are visualizing extended-precision integers as base
$b$ numbers, it is sometimes helpful to think of the situation
when $b = 10$, and to imagine that we are doing the arithmetic
by hand. Then operation (a$↓0$) is analogous to mem\-orizing the
addition table; (b$↓0$) is analogous to memorizing the multiplication
table; and (c$↓0$) is essentially memorizing the multiplication
table in reverse. The more complicated operations (a), (b),
(c) on high-precision numbers can now be done using the simple
addition, subtraction, multiplication, and long-division procedures
we are taught in elementary school. In fact, most of the algorithms
we shall discuss in this section are essentially nothing more than mechanizations
of familiar pencil-and-paper operations. Of course, we must
state the algorithms much more precisely than they have ever
been stated in the fifth grade, and we should also attempt to
minimize computer memory and running time requirements.
To avoid a tedious discussion and cumbersome notations, let
us assume that all numbers we deal with are {\sl nonnegative.}
The additional work of computing the signs, etc., is quite straightforward,
and the reader will find it easy to fill in any details of this
sort.
First comes addition, which of course is very simple, but it
is worth studying since the same ideas occur in the other algorithms
also:
\algbegin Algorithm A (Addition of nonnegative integers).
Given nonnegative $n$-place integers $(u↓1u↓2 \ldotsm u↓n)↓b$ and
$(v↓1v↓2 \ldotsm v↓n)↓b$, this algorithm forms their radix-$b$
sum, $(w↓0w↓1w↓2 \ldotsm w↓n)↓b$.\xskip (Here $w↓0$ is the ``\β{carry},''
and it will always be equal to 0 or↔1.)
\algstep A1. [Initialize.] Set $j ← n$, $k
← 0$.\xskip (The variable $j$ will run through the various digit positions,
and the variable $k$ keeps track of carries at each step.)
\algstep A2. [Add digits.] Set $w↓j ← (u↓j + v↓j + k)\mod b$,
and $k ← \lfloor (u↓j + v↓j + k)/b\rfloor $.\xskip (In other words,
$k$ is set to 1 or 0, depending on whether a ``carry'' occurs
or not, i.e., whether $u↓j + v↓j + k ≥ b$ or not. At most one
carry is possible during the two additions, since we always
have
$$u↓j + v↓j + k ≤ (b - 1) + (b - 1) + 1 < 2b,$$
by \α{induction on the computation}.)
\algstep A3. [Loop on $j$.] Decrease $j$ by one.
Now if $j > 0$, go back to step A2; otherwise set $w↓0 ← k$
and terminate the algorithm.\quad\blackslug
\penalty200\vskip 6pt plus 6pt minus 4pt
\noindent For a formal proof that Algorithm↔A
is a valid, see exercise↔4.
\penalty-200 % likewise (April 1980)
\yskip A \MIX\ program for this addition process might take the following
form:
\algbegin Program A (Addition of nonnegative integers).
Let $\.{LOC}(u↓j) ≡ \.U + j$, $\.{LOC}(v↓j) ≡ \.V + j$, $\.{LOC}(w↓j) ≡
\.W + j$, $\rI1 ≡ j$, $\rA ≡ k$, word size$\null ≡ b$, $\.N ≡ n$.
$$\vbox{\mixfive{\!
01⊗⊗ENT1⊗N⊗1⊗\understep{Al. Initialize.}\quad$j←n.$\cr
02⊗⊗JOV⊗OFLO⊗1⊗Ensure overflow is off.\cr
03⊗1H⊗ENTA⊗0⊗N + 1 - K⊗$k ← 0$.\cr
04⊗⊗J1Z⊗3F⊗N + 1 - K⊗To A3 if $j = 0$.\cr
05⊗2H⊗ADD⊗U,1⊗N⊗\understep{A2. Add di}{\sl g}\understep{its.}\cr
06⊗⊗ADD⊗V,1⊗N\cr
07⊗⊗STA⊗W,1⊗N\cr
08⊗⊗DEC1⊗1⊗N⊗\understep{A3. Loo}{\sl p\hskip-3pt}\understep{\hskip3pt\ on }$j
$\hskip-2pt\understep{\hskip2pt.}\cr
09⊗⊗JNOV⊗1B⊗N⊗If no overflow, set $k ← 0$.\cr
10⊗⊗ENTA⊗1⊗K⊗Otherwise, set $k ← 1$.\cr
11⊗⊗J1P⊗2B⊗K⊗To A2 if $j ≠ 0$.\cr
12⊗3H⊗STA⊗W⊗1⊗Store final carry in $w↓0$.\quad\blackslug\cr}}$$
The running time for this program is $10N
+ 6$ cycles, independent of the number of carries, $K$. The
quantity $K$ is analyzed in detail at the close of this section.
Many modifications of Algorithm↔A are possible, and only a few
of these are mentioned in the exercises below. A chapter on
generalizations of this algorithm might be entitled ``How to
design addition circuits for a digital computer.''
The problem of subtraction is similar to addition, but the differences
are worth noting:
\algbegin Algorithm S (Subtraction of nonnegative integers).
Given nonnegative $n$-place integers $(u↓1u↓2 \ldotsm u↓n)↓b ≥ (v↓1v↓2
\ldotsm v↓n)↓b$, this algorithm forms their nonnegative \hbox{radix-$b$}
difference, $(w↓1w↓2 \ldotsm w↓n)↓b$.
\algstep S1. [Initialize.] Set $j ← n$, $k ← 0$.
\algstep S2. [Subtract digits.] Set $w↓j ← (u↓j - v↓j + k)\mod
b$, and $k ← \lfloor (u↓j - v↓j + k)/b\rfloor $.\xskip (In other
words, $k$ is set to $-1$ or 0, depending on whether a ``\α{borrow}''
occurs or not, i.e., whether $u↓j - v↓j + k < 0$ or not. In
the calculation of $w↓j$, note that we must have $-b = 0 - (b
- 1) + (-1) ≤ u↓j - v↓j + k ≤ (b - 1) - 0 + 0 < b$; hence
$0 ≤ u↓j - v↓j + k + b < 2b$, and this suggests the method of
computer implementation explained below.)
\algstep S3. [Loop on $j$.] Decrease $j$ by one. Now if
$j > 0$, go back to step S2; otherwise terminate the algorithm.\xskip
(When the algorithm terminates, we should have $k = 0$; the
condition $k = -1$ will occur if and only if $v↓1 \ldotsm v↓n
> u↓1 \ldotsm u↓n$, and this is contrary to the given assumptions.
See exercise↔12.)\quad\blackslug
\yyskip In a \MIX\ program to
implement subtraction, it is most convenient to retain the value
$1 + k$ instead of $k$ throughout the algorithm, so that we
can calculate $u↓j - v↓j + (1 + k) + (b - 1)$ in step S2.\xskip (Recall
that $b$ is the word size.)\xskip This is illustrated in the following
code.
\algbegin Program S (Subtraction of nonnegative integers).
This program is analogous to the code in Program↔A\null; we have $\rI1 ≡ j$, $\rA
≡ 1 + k$. Here, as in other programs of this section, location
\α{\.{WM1}} contains the constant $b-1$, the largest possible value that can be
stored in a \MIX\ word; cf.\ Program 4.2.3D\null, lines 38--39.
$$\vbox{\mixfive{\!
01⊗⊗ENT1⊗N⊗1⊗\understep{S1. Initialize.}\quad$j ← n$.\cr
02⊗⊗JOV⊗OFLO⊗1⊗Ensure overflow is off.\cr
03⊗1H⊗J1Z⊗DONE⊗K + 1⊗Terminate if $j = 0$.\cr
04⊗⊗ENTA⊗1⊗K⊗Set $k ← 0$.\cr
05⊗2H⊗ADD⊗U,1⊗N⊗\understep{S2. Subtract di}{\sl g}\understep{its.}\cr
06⊗⊗SUB⊗V,1⊗N⊗Compute $u↓j - v↓j + k + b$.\cr
07⊗⊗ADD⊗WM1⊗N\cr
08⊗⊗STA⊗W,1⊗N⊗(May be minus zero)\cr
09⊗⊗DEC1⊗1⊗N⊗\understep{S3. Loo}{\sl p\hskip-3pt}\understep{\hskip3pt\
on }$j$\hskip-2pt\understep{\hskip2pt.}\cr
10⊗⊗JOV⊗1B⊗N⊗If overflow, set $k ← 0$.\cr
11⊗⊗ENTA⊗0⊗N - K⊗Otherwise, set $k ← -1$.\cr
12⊗⊗J1P⊗2B⊗N - K⊗Back to S2.\cr
13⊗⊗HLT⊗5⊗⊗(Error, $v>u$)\quad\blackslug\cr}}$$
The running time for this program is $12N+3$ cycles, slightly longer than
the corresponding amount for Program↔A.
% Vicinity of page 253
% folio 347 galley 2 *
The reader may wonder if it would not be worthwhile to have
a combined addition-subtraction routine in place of the two
algorithms A and↔S\null. But an examination of the computer programs shows that
it is generally better to use two different routines, so that
the inner loops of the computations can be performed as rapidly
as possible, since the programs are so short.
\yskip Our next problem is \β{multiplication,} and here we carry the ideas
used in Algorithm↔A a little further:
\algbegin Algorithm M (Multiplication of nonnegative integers).
Given nonnegative integers $(u↓1u↓2 \ldotsm u↓n)↓b$ and $(v↓1v↓2 \ldotsm
v↓m)↓b$, this algorithm forms their radix-$b$ product $(w↓1w↓2
\ldotsm w↓{m+n})↓b$.\xskip (The conventional pencil-and-paper method
is based on forming the partial products $(u↓1u↓2 \ldotsm u↓n) \times
v↓j$ first, for $1 ≤ j ≤ m$, and then adding these products
together with appropriate scale factors; but in a computer it
is best to do the addition concurrently with the multiplication,
as described in this algorithm.)
\algstep M1. [Initialize.] Set $w↓{m+1}$,
$w↓{m+2}$, $\ldotss$, $w↓{m+n}$ all to zero. Set $j ← m$.\xskip (If $w↓{m+1}$,
$\ldotss$, $w↓{m+n}$ were not cleared to zero in this step, it
turns out that the steps below would set
$$(w↓1 \ldotsm w↓{m+n})↓b ← (u↓1 \ldotsm u↓n)↓b \times (v↓1
\ldotsm v↓m)↓b + (w↓{m+1} \ldotsm w↓{m+n})↓b.$$
This more general operation is sometimes useful.)
\algstep M2. [Zero multiplier?] If $v↓j = 0$, set
$w↓j ← 0$ and go to step M6.\xskip (This test saves a good deal of
time if there is a reasonable chance that $v↓j$ is zero, but
otherwise it may be omitted without affecting the validity of
the algorithm.)
\algstep M3. [Initialize $i$.] Set $i ← n$, $k ← 0$.
\algstep M4. [Multiply and add.] Set $t ← u↓i \times v↓j + w↓{i+j}
+ k$; then set $w↓{i+j} ← t \mod b$ and $k ← \lfloor t/b\rfloor$.\xskip
(Here the ``\β{carry}'' $k$ will always be in the range $0 ≤ k
< b$; see below.)
\algstep M5. [Loop on $i$.] Decrease $i$ by one. Now if
$i > 0$, go back to step M4; otherwise set $w↓j ← k$.
\algstep M6. [Loop on $j$.] Decrease $j$ by one. Now if
$j > 0$, go back to step M2; otherwise the algorithm terminates.\quad\blackslug
\yyskip Algorithm M is illustrated in Table↔1,
assuming that $b = 10$, by showing the states of the computation
at the beginning of steps M5 and↔M6. A proof of Algorithm↔M
appears in the answer to exercise↔14.
The two inequalities
$$0 ≤ t < b↑2,\qquad 0 ≤ k < b\eqno (1)$$
are crucial for an efficient implementation of
this algorithm, since they point out how large a register is
needed for the computations. These inequalities may be proved
by \α{induction} as the algorithm proceeds, for if we have $k <
b$ at the start of step M4, we have
$$u↓i \times v↓j + w↓{i+j} + k ≤ (b - 1) \times (b - 1) + (b
- 1) + (b - 1) = b↑2 - 1 < b↑2.$$
\topinsert{\tablehead{Table 1}
\vskip 3pt plus 2pt minus 1pt
\ctrline{\hbox expand 6pt{MULTIPLICATION OF 914 BY 84.}}
\ninepoint
\vskip 6pt plus 2pt minus 1pt
\baselineskip0pt\lineskip0pt
\def\\{\vrule height 8.5pt depth 2.5pt}
\def\¬{\vrule height 2pt}
\tabskip 0pt plus 1000pt
\halign to size{\ctr{#}\tabskip15pt⊗#⊗$\ctr{#}$⊗$\ctr{#}$⊗$\ctr{#}$⊗$\ctr{#}$⊗$
\ctr{#}$⊗#⊗$\ctr{#}$⊗$\ctr{#}$⊗$\ctr{#}$⊗$\ctr{#}$⊗$\ctr{#}$\tabskip0pt plus 1000pt
\cr
Step⊗\\⊗i⊗j⊗u↓i⊗v↓j⊗t⊗\\⊗w↓1⊗w↓2⊗w↓3⊗w↓4⊗w↓5\cr
⊗\¬⊗⊗⊗⊗⊗⊗\¬\cr
M5⊗\\⊗3⊗2⊗4⊗4⊗16⊗\\⊗x⊗x⊗0⊗0⊗6\cr
M5⊗\\⊗2⊗2⊗1⊗4⊗05⊗\\⊗x⊗x⊗0⊗5⊗6\cr
M5⊗\\⊗1⊗2⊗9⊗4⊗36⊗\\⊗x⊗x⊗6⊗5⊗6\cr
M6⊗\\⊗0⊗2⊗x⊗4⊗36⊗\\⊗x⊗3⊗6⊗5⊗6\cr
M5⊗\\⊗3⊗1⊗4⊗8⊗37⊗\\⊗x⊗3⊗6⊗7⊗6\cr
M5⊗\\⊗2⊗1⊗1⊗8⊗17⊗\\⊗x⊗3⊗7⊗7⊗6\cr
M5⊗\\⊗1⊗1⊗9⊗8⊗76⊗\\⊗x⊗6⊗7⊗7⊗6\cr
M6⊗\\⊗0⊗1⊗x⊗8⊗76⊗\\⊗7⊗6⊗7⊗7⊗6\cr}}
The following \MIX\ program shows
the considerations that are necessary when Algorithm↔M is implemented
on a computer. The coding for step M4 would be a little simpler
if our computer had a ``multiply-and-add'' instruction, or if
it had a double-length accumulator for addition.
\algbegin Program M. (Multiplication of nonnegative
integers). This program is analogous to Program↔A\null.
$\rI1 ≡ i$, $\rI2 ≡ j$, $\rI3 ≡ i + j$, $\.{CONTENTS(CARRY)} ≡ k$.
{\yyskip\tabskip 5pt\mixfive{\!
01⊗⊗ENT1⊗N⊗1⊗\understep{M1. Initialize.}\cr
02⊗⊗JOV⊗OFLO⊗1⊗Ensure overflow is off.\cr
03⊗⊗STZ⊗W+M,1⊗N⊗$w↓{m+i} ← 0$.\cr
04⊗⊗DEC1⊗1⊗N\cr
05⊗⊗J1P⊗*-2⊗N⊗Repeat for $n ≥ i > 0$.\cr
\\06⊗⊗ENT2⊗M⊗1⊗$j ← m$.\cr
\\07⊗1H⊗LDX⊗V,2⊗M⊗\understep{M2. Zero multi}{\sl p\hskip-3pt}\understep{\hskip3pt
lier?}\cr
08⊗⊗JXZ⊗8F⊗M⊗If $v↓j = 0$, set $w↓j ← 0$ and go to M6.\cr
\\09⊗⊗ENT1⊗N⊗M - Z⊗\understep{M3. Initialize i.}\cr
10⊗⊗ENT3⊗N,2⊗M - Z⊗$i ← n$, $(i + j) ← (n + j)$.\cr
11⊗⊗ENTX⊗0⊗M - Z⊗$k ← 0$.\cr
\\12⊗2H⊗STX⊗CARRY⊗(M - Z)N⊗\understep{M4. Multi}{\sl p\hskip-3pt}\understep{\hskip
3pt l\hskip1pt}{\sl\hskip-1pt y\hskip-2pt}\understep{\hskip2pt\ and add.}\cr
13⊗⊗LDA⊗U,1⊗(M - Z)N\cr
14⊗⊗MUL⊗V,2⊗(M - Z)N⊗$\rAX←u↓i\times v↓j.$\cr
\\15⊗⊗SLC⊗5⊗(M - Z)N⊗Interchange $\rA \xch rX$.\cr
\\16⊗⊗ADD⊗W,3⊗(M - Z)N⊗Add $w↓{i+j}$ to lower half.\cr
17⊗⊗JNOV⊗*+2⊗(M - Z)N⊗Did overflow occur?\cr
18⊗⊗INCX⊗1⊗K⊗If so, carry 1 into upper half.\cr
\\19⊗⊗ADD⊗CARRY⊗(M - Z)N⊗Add $k$ to lower half.\cr
20⊗⊗JNOV⊗*+2⊗(M - Z)N⊗Did overflow occur?\cr
21⊗⊗INCX⊗1⊗K↑\prime ⊗If so, carry 1 into upper half.\cr
\\22⊗⊗STA⊗W,3⊗(M - Z)N⊗$w↓{i+j} ← t \mod b$.\cr
\\23⊗⊗DEC1⊗1⊗(M - Z)N⊗\understep{M5. Loo}{\sl p\hskip-3pt}\understep{\hskip3pt\
on $i$.}\cr
24⊗⊗DEC3⊗1⊗(M - Z)N⊗Decrease $i$ and $(i + j)$ by 1.\cr
25⊗⊗J1P⊗2B⊗(M - Z)N⊗Back to M4 if $i > 0$; $\rX = \lfloor
t/b\rfloor$.\cr
\\26⊗8H⊗STX⊗W,2⊗M⊗Set $w↓j ← k$.\cr
\\27⊗⊗DEC2⊗1⊗M⊗\understep{M6. Loo}{\sl p\hskip-3pt}\understep{\hskip3pt\ on }$j
$\hskip-2pt\understep{\hskip2pt.}\cr
28⊗⊗J2P⊗1B⊗M⊗Repeat until $j = 0$.\quad\blackslug\cr}}
\yyskip The execution time of Program↔M depends on
the number of places, $M$, in the multiplier $v$; the number of
places, $N$, in the multiplicand $u$; the number of zeros, $Z$,
in the multiplier; and the number of carries, $K$ and $K↑\prime$,
that occur during the addition to the lower half of the product
in the computation of $t$. If we approximate both $K$ and $K↑\prime$
by the reasonable (although somewhat pessimistic) values ${1\over
2}(M - Z)N$, we find that the total running time comes to
$28MN + 10M + 4N + 3 - Z(28N + 3)$ cycles. If step M2 were deleted,
the running time would be $28MN + 7M + 4N + 3$ cycles, so this
step is not advantageous unless the density of zero positions
within the multiplier is $Z/M > 3/(28N + 3)$. If the multiplier
is chosen completely at random, this ratio $Z/M$ is expected
to be only about $1/b$, which is extremely small; so step M2
is usually {\sl not} worthwhile, unless $b$ is small.
Algorithm M is not the fastest way to multiply when $m$ and
$n$ are large, although it has the advantage of simplicity.
Speedier methods are discussed in Section 4.3.3; even when $m
= n = 4$, it is possible to multiply numbers in a little less
time than is required by Algorithm↔M.
\yskip The final algorithm of concern to
us in this section is long \β{division,} in which we want to divide
$(n + m)$-place integers by $n$-place integers. Here the ordinary
pencil-and-paper method involves a certain amount of guess\-work
and ingenuity on the part of the person doing the division;
we must either eliminate this guess\-work from the algorithm or
develop some theory to explain it more carefully.
A moment's reflection about the ordinary process of long division
shows that the general problem breaks down into simpler steps,
each of which is the division of an $(n + 1)$-place number $u$
by the $n$-place divisor $v$, where $0 ≤ u/v < b$; the remainder
$r$ after each step is less than $v$, so we may use the quantity $rb + \null$(next
place of dividend) as the new $u$ in the succeeding step. For
example, if we are asked to divide 3142 by 47, we first divide
314 by 47, getting 6 and a remainder of 32; then we divide
322 by 47, getting 6 and a remainder of 40; thus we have a quotient
of 66 and a remainder of 40. It is clear that this same idea
works in general, and so our search for an appropriate division
algorithm reduces to the following problem (Fig.↔6):
$$\hbox par 335pt{\sl Let $u = (u↓0u↓1 \ldotsm u↓n)↓b$ and $v
= (v↓1v↓2 \ldotsm v↓n)↓b$ be nonnegative integers in radix-$b$ notation,
such that $u/v < b$. Find an algorithm to determine $q = \lfloor
u/v\rfloor $.}$$
We may observe that the condition $u/v
< b$ is equivalent to the condition that $u/b < v$; i.e., $\lfloor
u/b\rfloor < v$. This is simply the condition that $(u↓0u↓1 \ldotsm
u↓{n-1})↓b < (v↓1v↓2 \ldotsm v↓n)↓b$. Furthermore, if we write $r =
u - qv$, then $q$ is the unique integer such that $0 ≤ r < v$.
\topinsert{\ctrline{\!
$\vcenter{
\inx{COPY PREP: Fig 4-6 to be inserted}
\hbox par 36mm{\caption Fig.\ 6. Wanted: a way to determine $q$
rapidly.}}$\qquad\qquad\lower7.5mm\vbox to 20mm{}\hskip 30mm}}
% Vicinity of page 256
% folio 350 galley 3 WARNING: Much tape unreadable! *
\def\q{\A q}\def\r{\A r}
The most obvious approach to this problem
is to make a guess about $q$, based on the most significant
digits of $u$ and $v$. It isn't obvious that such a method will
be reliable enough, but it is worth investigating; let us therefore
set
$$\q = \min\left(\,\left\lfloor u↓0b + u↓1\over v↓1\right\rfloor\,,\,b -
1\right).\eqno (2)$$
This formula says $\q$ is obtained
by dividing the two leading digits of $u$ by the leading digit
of $v$; and if the result is $b$ or more we can replace it by
$(b - 1)$.
It is a remarkable fact, which we will now investigate, that
this value $\q$ is always a very good approximation to
the desired answer $q$, so long as $v↓1$ is reasonably large.
In order to analyze how close $\q$ comes to $q$, we will
first prove that $\q$ is never too small.
\thbegin Theorem A. {\sl In the notation
above, $\q ≥ q$.}
\proofbegin Since $q ≤ b - 1$, the theorem
is certainly true if $\q = b - 1$. Otherwise we have
$\q = \lfloor (u↓0b + u↓1)/v↓1\rfloor $, hence $\q v↓1 ≥ u↓0b + u↓1
- v↓1 + 1$. It follows that
$$\eqalign{u - \q v ≤ u - \q v↓1b↑{n-1} ⊗≤ u↓0b↑n
+ \cdots + u↓n\cr
\noalign{\vskip 2pt}
⊗\qquad\null - (u↓0b↑n + u↓1b↑{n-1} - v↓1b↑{n-1} + b↑{n-1})\cr
\noalign{\vskip 4pt}
⊗ = u↓2b↑{n-2}\!+ \cdots +\!u↓n\!-\!b↑{n-1}\!+\!v↓1b↑{n-1}
< v↓1b↑{n-1} ≤ v.\cr}$$
Since $u - \q v < v$, we must have $\q ≥ q$.\quad\blackslug
\yyskip We will now prove that $\q$
cannot be much larger than $q$ in practical situations. Assume
that $\q ≥ q + 3$. We have
$$\q ≤ {u↓0b + u↓1\over v↓1} = {u↓0b↑n + u↓1b↑{n-1}\over
v↓1b↑{n-1}} ≤ {u\over v↓1b↑{n-1}} < {u\over v - b↑{n-1}}.$$
$\biglp$The case $v = b↑{n-1}$ is impossible, for if $v
= (100 \ldotsm 0)↓b$ then $q = \q.\bigrp$ Further\-more, the relation
$q > (u/v) - 1$ implies that
$$3 ≤ \q - q < {u\over v - b↑{n-1}} - {u\over v} + 1
= {u\over v} \left(b↑{n-1}\over v - b↑{n-1}\right) + 1.$$
Therefore
$${u\over v} > 2\,\left(v - b↑{n-1}\over b↑{n-1}\right) ≥ 2(v↓1
- 1).$$
Finally, since $b - 4 ≥ \q - 3 ≥ q = \lfloor
u/v\rfloor ≥ 2(v↓1 - 1)$, we have $v↓1 < \lfloor b/2\rfloor
$. This proves the result we seek:
\thbegin Theorem B. {\sl If $v↓1 ≥ \lfloor b/2\rfloor$,
then $\q - 2 ≤ q ≤ \q$.}\quad\blackslug
\yyskip The most important part of this
theorem is that {\sl the conclusion is independent of\/} $b$; no
matter how large $b$ is, the trial quotient $\q$ will
never be more than 2 in error.
The condition that $v↓1 ≥ \lfloor b/2\rfloor$ is very much like
a normalization requirement; in fact, it is exactly the condition
of floating-binary normalization in a binary computer. One simple way to ensure
that $v↓1$ is sufficiently large is to multiply both $u$
and $v$ by $\lfloor b/(v↓1 + 1)\rfloor $; this does not change
the value of $u/v$, nor does it increase the number of places
in $v$, and exercise↔23 proves that it will always make the
new value of $v↓1$ large enough.\xskip ({\sl Note:} Another
way to normalize the divisor is discussed in exercise↔28.)
Now that we have armed ourselves with all of these facts, we
are in a position to write the desired long-division algorithm.
This algorithm uses a slightly improved choice of $\q$
in step D3, which guarantees that $q = \q$ or $\q
- 1$; in fact, the improved choice of $\q$ made here
is almost always accurate.
\algbegin Algorithm D (Division of nonnegative
integers). Given nonnegative integers $u = (u↓1u↓2 \ldotsm
u↓{m+n})↓b$ and $v = (v↓1v↓2 \ldotsm v↓n)↓b$, where $v↓1
≠ 0$ and $n > 1$, we form the \hbox{radix-$b$} quotient $\lfloor u/v\rfloor =
(q↓0q↓1 \ldotsm q↓m)↓b$ and the remainder $u \mod v = (r↓1r↓2
\ldotsm r↓n)↓b$.\xskip (This notation is slightly different from that
used in the above proofs. When $n = 1$, the simpler algorithm
of exercise↔16 should be used.)
\algstep D1. [Normalize.] Set $d ← \lfloor
b/(v↓1 + 1)\rfloor $. Then set $(u↓0u↓1u↓2 \ldotsm u↓{m+n})↓b$ equal to
$(u↓1u↓2 \ldotsm u↓{m+n})↓b$ times $d$, and set $(v↓1v↓2 \ldotsm v↓n)↓b$ equal
to $(v↓1v↓2 \ldotsm v↓n)↓b$ times↔$d$.\xskip (Note the introduction of
a new digit position $u↓0$ at the left of $u↓1$; if $d = 1$,
all we need to do in this step is to set $u↓0 ← 0$. On a binary
computer it may be preferable to choose $d$ to be a power of
2 instead of using the value suggested here; any value of $d$
that results in $v↓1 ≥ \lfloor b/2\rfloor$ will suffice. See also exercise↔37.)
\algstep D2. [Initialize $j$.] Set $j ← 0$.\xskip (The loop on
$j$, steps D2 through D7, will be essentially a division of
$(u↓ju↓{j+1} \ldotsm u↓{j+n})↓b$ by $(v↓1v↓2 \ldotsm v↓n)↓b$ to get a
single quotient digit $q↓j$; cf.\ Fig.↔6.)
\algstep D3. [Calculate $\q$.] If $u↓j = v↓1$, set $\q
← b - 1$; otherwise set $\q ← \lfloor (u↓jb + u↓{j+1})/v↓1\rfloor
$. Now test if $v↓2\q > (u↓jb + u↓{j+1} - \q v↓1)b
+ u↓{j+2}$; if so, decrease $\q$ by 1 and repeat this
test.\xskip (The latter test determines at high speed most of the
cases in which the trial value $\q$ is one too large,
and it eliminates {\sl all} cases where $\q$ is two too
large; see exercises 19, 20,↔21.)
\algstep D4. [Multiply and subtract.] Replace $(u↓ju↓{j+1} \ldotsm
u↓{j+n})↓b$ by $(u↓ju↓{j+1} \ldotsm u↓{j+n})↓b$ minus $\q$ times
$(v↓1v↓2 \ldotsm v↓n)↓b$. This step (analogous to steps M3, M4, and M5
of Algorithm↔M) consists of a simple multiplication by a one-place
number, combined with a subtraction. The digits $(u↓j,u↓{j+1},
\ldotss,u↓{j+n})$ should be kept positive; if the result of this
step is actually negative, $(u↓ju↓{j+1} \ldotsm u↓{j+n})↓b$ should
be left as the true value plus $b↑{n+1}$, i.e., as the $b$'s
complement of the true value, and a ``\α{borrow}'' to the left should
be remembered.
\algstep D5. [Test remainder.] Set $q↓j ←
\q$. If the result of step D4 was negative, go to step
D6; otherwise go on to step D7.
\algstep D6. [Add back.] (The probability that this step is
necessary is very small, on the order of only $2/b$, as shown in exercise↔21;
test data that activates this step should therefore be
specifically contrived when debugging.)\xskip Decrease $q↓j$ by 1,
and add $(0v↓1v↓2 \ldotsm v↓n)↓b$ to $(u↓ju↓{j+1}u↓{j+2} \ldotsm u↓{j+n})↓b$.\xskip
(A \α{carry} will occur to the left of $u↓j$, and it should be
ignored since it cancels with the ``borrow'' that occurred
in D4.)
\algstep D7. [Loop on $j$.] Increase $j$ by one. Now if
$j ≤ m$, go back to D3.
\algstep D8. [Unnormalize.] Now $(q↓0q↓1 \ldotsm q↓m)↓b$ is the desired
quotient, and the desired remainder may be obtained by dividing
$(u↓{m+1} \ldotsm u↓{m+n})↓b$ by $d$.\quad\blackslug
\topinsert{\vskip 49mm
\inx{COPY PREP: Fig 4-7 to be inserted}
\ctrline{\caption Fig.\ 7. Long division.}}
\yyskip The representation
of Algorithm↔D as a \MIX\ program has several points of interest:
\algbegin Program D (Division of nonnegative integers).
The conventions of this program are analogous to Program↔A\null;
$\rI1 ≡ i$, $\rI2 ≡ j - m$, $\rI3 ≡ i + j$.
{\yyskip\tabskip 5pt\mixfive{\!
001⊗D1⊗JOV⊗OFLO⊗1⊗\understep{D1. Normalize.}\cr
$\cdots$⊗⊗⊗⊗⊗(See exercise 25)\cr
\\039⊗D2⊗ENN2⊗M⊗1⊗\understep{D2. Initialize }$j$\hskip-2pt\understep{\hskip2pt.}\cr
040⊗⊗STZ⊗V⊗1⊗Set $v↓0 ← 0$, for convenience in D4.\cr
\\041⊗D3⊗LDA⊗U+M,2(1:5)\hskip-25pt
⊗M + 1⊗\understep{D3. Calculate \hskip2.5pt}\hskip-2.5pt$\q$\!
\hskip-1pt\understep{\hskip1pt.}\cr
042⊗⊗LDX⊗U+M+1,2⊗M + 1⊗$\rAX ← u↓jb + u↓{j+1}$.\cr
\\043⊗⊗DIV⊗V+1⊗M + 1⊗$\rA ← \lfloor rAX/v↓1\rfloor$.\cr
\\044⊗⊗JOV⊗1F⊗M + 1⊗Jump if quotient$\null=b$.\cr
045⊗⊗STA⊗QHAT⊗M + 1⊗$\q ← \rA$.\cr
046⊗⊗STX⊗RHAT⊗M + 1⊗$\r ← u↓jb + u↓{j+1} - \q v↓1$\cr
047⊗⊗JMP⊗2F⊗M + 1⊗\qquad$=(u↓jb + u↓{j+1})\mod v↓1$.\cr
\\048⊗1H⊗LDX⊗WM1⊗⊗$\rX ← b - 1$.\cr
049⊗⊗LDA⊗U+M+1,2⊗⊗$\rA ← u↓{j+1}$.\xskip (Here $u↓j = v↓1$.)\cr
050⊗⊗JMP⊗4F\cr
\\051⊗3H⊗LDX⊗QHAT⊗E\cr
052⊗⊗DECX⊗1⊗E⊗Decrease $\q$ by one.\cr
053⊗⊗LDA⊗RHAT⊗E⊗Adjust $\r$ accordingly:\cr
054⊗4H⊗STX⊗QHAT⊗E⊗$\q←\rX$.\cr
055⊗⊗ADD⊗V+1⊗E⊗$\rA←\rA+v↓1$.\cr
056⊗⊗JOV⊗D4⊗E⊗(If \r\ will be $≥b$, $v↓2\q$ will be $<\r b$.)\cr
\\057⊗⊗STA⊗RHAT⊗E⊗$\r←\rA$.\cr
058⊗⊗LDA⊗QHAT⊗E\cr
\\059⊗2H⊗MUL⊗V+2⊗M+E+1\cr
060⊗⊗CMPA⊗RHAT⊗M+E+1⊗Test if $v↓2\q≤\r b+u↓{j+2}$.\cr
061⊗⊗JL⊗D4⊗M+E+1\cr
062⊗⊗JG⊗3B⊗E\cr
063⊗⊗CMPX⊗U+M+2,2\cr
064⊗⊗JG⊗3B⊗⊗If not, $\q$ is too large.\cr
\\065⊗D4⊗ENTX⊗1⊗M+1⊗\understep{D4. Multi}{\sl p\hskip-3pt}\understep{\hskip
3pt l\hskip1pt}{\sl\hskip-1pt y\hskip-2pt}\understep{\hskip2pt\ and subtract.}\cr
066⊗⊗ENT1⊗N⊗M+1⊗$i←n$.\cr
067⊗⊗ENT3⊗M+N,2⊗M+1⊗$(i+j)←(n+j)$.\cr
\\068⊗2H⊗STX⊗CARRY⊗(M+1)(N+1)⊗(Here $1-b<\rX≤+1$.)\cr
069⊗⊗LDAN⊗V,1⊗(M+1)(N+1)\cr
\\070⊗⊗MUL⊗QHAT⊗(M+1)(N+1)⊗$\rAX←-\q v↓i$.\cr
071⊗⊗SLC⊗5⊗(M+1)(N+1)⊗Interchange $\rA \xch \rX$.\cr
\\072⊗⊗ADD⊗CARRY⊗(M+1)(N+1)⊗Add the contribution from the\cr
073⊗⊗JNOV⊗*+2⊗(M+1)(N+1)⊗\qquad digit to the
right, plus 1.\cr
074⊗⊗DECX⊗1⊗K⊗If sum is $≤ -b$, carry $-1$.\cr
\\075⊗⊗ADD⊗U,3⊗(M+1)(N+1)⊗Add $u↓{i+j}$.\cr
\\076⊗⊗ADD⊗WM1⊗(M+1)(N+1)⊗Add $b - 1$ to force + sign.\cr
077⊗⊗JNOV⊗*+2⊗(M+1)(N+1)⊗If no overflow, carry $-1$.\cr
078⊗⊗INCX⊗1⊗K↑\prime ⊗$\rX ≡ \hbox{carry}+1$.\cr
\\079⊗⊗STA⊗U,3⊗(M+1)(N+1)⊗$u↓{i+j} ← \rA$ (may be minus zero).\cr
080⊗⊗DEC1⊗1⊗(M+1)(N+1)\cr
081⊗⊗DEC3⊗1⊗(M+1)(N+1)\cr
082⊗⊗J1NN⊗2B⊗(M+1)(N+1)⊗Repeat for $n≥ i ≥ 0$.\cr
\\083⊗D5⊗LDA⊗QHAT⊗M + 1⊗\understep{D5. Test remainder.}\cr
084⊗⊗STA⊗Q+M,2⊗M + 1⊗Set $q↓j ← \q$.\cr
085⊗⊗JXP⊗D7⊗M + 1⊗(Here $\rX = 0$ or 1, since $v↓0 = 0$.)\cr
\\086⊗D6⊗DECA⊗1⊗⊗\understep{D6. Add back.}\cr
087⊗⊗STA⊗Q+M,2⊗⊗Set $q↓j ← \q - 1$.\cr
\\088⊗⊗ENT1⊗N⊗⊗$i ← n$.\cr
089⊗⊗ENT3⊗M+N,2⊗⊗$(i + j) ← (n + j)$.\cr
\\090⊗1H⊗ENTA⊗0⊗⊗(This is essentially Program↔A.)\cr
091⊗2H⊗ADD⊗U,3\cr
092⊗⊗ADD⊗V,1\cr
093⊗⊗STA⊗U,3\cr
\\094⊗⊗DEC1⊗1\cr
095⊗⊗DEC3⊗1\cr
096⊗⊗JNOV⊗1B\cr
097⊗⊗ENTA⊗1\cr
098⊗⊗J1NN⊗2B\cr
\\099⊗D7⊗INC2⊗1⊗M+1⊗\understep{D7. Loo}{\sl p\hskip-3pt}\understep{\hskip3pt\ on }$j
$\hskip-2pt\understep{\hskip2pt.}\cr
100⊗⊗J2NP⊗D3⊗M + 1⊗Repeat for $0 ≤ j ≤ m$.\cr
101⊗D8⊗$\cdots$⊗⊗⊗(See exercise 26)\quad\blackslug\cr}}
% Vicinity of page 260
% folio 354 galley 4 *
\yyskip Note how easily the rather
complex-appearing calculations and decisions of step D3 can
be handled inside the machine. Note also that the program for
step↔D4 is analogous to Program↔M\null, except that the ideas of
Program↔S have also been incorporated.
The running time for Program↔D can be estimated by considering
the quantities $M$, $N$, $E$, $K$, and $K↑\prime$ shown in the program.\xskip
(These quantities ignore several situations that occur only
with very low probability; for example, we may assume that lines
048--050, 063--064, and step D6 are never executed.)\xskip Here $M
+ 1$ is the number of words in the quotient; $N$ is the number
of words in the divisor; $E$ is the number of times $\A q$
is adjusted downwards in step D3; $K$ and $K↑\prime$ are
the number of times certain ``carry'' adjustments are made during
the multiply-subtract loop. If we assume that $K + K↑\prime$
is approximately $(N + 1)(M + 1)$, and that $E$ is approximately
${1\over 2}M$, we get a total running time of approximately
$$30MN + 30N + 89M + 111$$
cycles, plus $67N + 235M + 4$ more if $d > 1$.\xskip (The
program segments of exercises 25 and↔26 are included in these
totals.)\xskip When $M$ and $N$ are large, this is only about seven
percent longer than the time Program↔M takes to multiply the
quotient by the divisor.
Further commentary on Algorithm↔D appears in the exercises at
the close of this section.
\penalty-500 % try to do it here (April 1980)
\yskip It is possible to debug programs
for multiple-precision arithmetic by using the multiplication
and addition routines to check the result of the division routine,
etc. The following type of test data is occasionally useful:
$$(t↑m - 1)(t↑n - 1) = t↑{m+n} - t↑n - t↑m + 1.$$
If $m < n$, this number has the radix-$t$ expansion
$$\def\upbrace{$\char'774$\leaders\hrule height 1.5pt\hfill$\char'773
\char'772$\leaders\hrule height 1.5pt\hfill$\char'775$}
\vbox{\halign{#⊗#⊗#⊗#⊗#⊗#\cr
$(t-1)\quad\ldots\quad(t-1)$⊗$\quad(t-2)\quad$⊗$(t-1)\quad\ldots\quad(t-1)$⊗\quad
⊗$0\quad\ldots\quad0$⊗\quad1;\cr
\upbrace⊗⊗\upbrace⊗⊗\upbrace\cr
\hfill$m-1$ places\hfill⊗⊗\hfill$n-m$ places\hfill⊗⊗\hskip-10pt plus 1000pt $m-1$
places\hskip-10pt plus 1000pt\cr}}$$
for example, $(10↑3 - 1)(10↑5 - 1) = 99899001$.
In the case of Program↔D\null, it is also necessary to find some
test cases that cause the rarely executed parts of the program
to be exercised; some portions of that program would probably never
get tested even if a million random test cases were tried.
Now that we have seen how to operate with signed-magnitude numbers,
let us consider what approach should be taken to the same problems
when a computer with \β{complement notation} is being used. For
\β{two's complement} and \β{ones' complement} notations, it is usually best
to let the radix $b$ be {\sl one half\/} of the word size; thus for a
32-bit computer word we would use $b = 2↑{31}$ in the above
algorithms. The sign bit of all but the most significant word
of a multiple-precision number will be zero, so that no anomalous
sign correction takes place during the computer's multiplication
and division operations. In fact, the basic meaning of complement
notation requires that we consider all but the most significant
word to be nonnegative. For example, assuming an 8-bit word,
the two's complement number
$$11011111\quad 1111110\quad 1101011$$
(where the sign bit is shown only in the most significant
word) is properly thought of as
$$-2↑{21} + (1011111)↓2 \cdot 2↑{14} + (1111110)↓2 \cdot
2↑7 + (1101011)↓2.$$
Addition of signed numbers is slightly
easier when complement notations are being used, since the routine
for adding $n$-place nonnegative integers can be used for arbitrary
$n$-place integers; the sign appears only in the first word,
so the less significant words may be added together irrespective
of the actual sign.\xskip (Special attention must be given to the
leftmost carry when ones' complement notation is being used,
however; it must be added into the least significant word, and
possibly propagated further to the left.)\xskip Similarly, we find
that subtraction of signed numbers is slightly simpler with
complement notation. On the other hand, multiplication and division
seem to be done most easily by working with nonnegative quantities
and doing suitable complementation operations beforehand to
make sure that both operands are nonnegative; it may be possible
to avoid this complementation by devising some tricks for working
directly with negative numbers in a complement notation, and
it is not hard to see how this could be done in double-precision
multiplication, but care should be taken not to slow down the
inner loops of the subroutines when high precision is required.
Note that the product of two $m$-place numbers in two's complement
notation may require $2m + 1$ places: the square of $(-b↑m)$ is
$b↑{2m}$.
\yskip Let us now turn to an analysis of
\inxf{analysis of algorithms}
the quantity $K$ that arises in Program↔A\null, i.e., the number
of \β{carries} that occur when two $n$-place numbers are being added
together. Although $K$ has no effect on the total running
time of Program↔A\null, it does affect the running time of the
Program↔A's counterparts that deal with complement notations,
and its analysis is interesting in itself as a significant application
of \β{generating functions}.
Suppose that $u$ and $v$ are independent random $n$-place
integers, uniformly distributed in the range $0 ≤ u, v < b↑n$.
Let $p↓{nk}$ be the probability that exactly $k$ carries occur
in the addition of $u$ to $v$, {\sl and} that one of these carries
occurs in the most significant position (so that $u + v ≥
b↑n)$. Similarly, let $q↓{nk}$ be the probability that exactly
$k$ carries occur, but that there is no carry in the most significant
position. Then it is not hard to see that
$$\baselineskip 27pt
\eqalignno{p↓{0k}=0,\quad⊗\quad q↓{0k}=\delta↓{0k},\qquad\hbox{for all }k;\cr
p↓{(n+1)(k+1)}⊗= {b + 1\over 2b} p↓{nk} +
{b - 1\over 2b} q↓{nk},⊗(3)\cr
q↓{(n+1)k} ⊗=
{b - 1\over 2b} p↓{nk} + {b + 1\over 2b} q↓{nk};\cr}$$
this happens because $(b - 1)/2b$ is the probability
that $u↓1 + v↓1 ≥ b$ and $(b + 1)/2b$ is the probability that
$u↓1 + v↓1 + 1 ≥ b$, when $u↓1$ and $v↓1$ are independently
and uniformly distributed integers in the range $0 ≤ u↓1, v↓1
< b$.
To obtain further information about these
quantities $p↓{nk}$ and $q↓{nk}$, we may set up the generating
functions
$$P(z, t) = \sum ↓{k,n}p↓{nk}\,z↑kt↑n,\qquad Q(z, t) = \sum ↓{k,n}q↓{nk}\,z↑kt↑n.
\eqno(4)$$
From (3) we have the basic relations
$$\eqalign{P(z, t) ⊗= zt\, \left({b + 1\over 2b} P(z, t) +
{b - 1\over 2b} Q(z, t)\right)\,,\cr
\noalign{\vskip 4pt}
Q(z, t) ⊗= 1 + t\, \left({b - 1\over 2b} P(z,
t) + {b + 1\over 2b} Q(z, t)\right)\,.\cr}$$
These two equations are readily solved for $P(z, t)$ and $Q(z,t)$;
and if we let
$$G(z, t) = P(z, t) + Q(z, t) = \sum ↓{n}G↓n(z)t↑n,$$
where $G↓n(z)$ is the generating function for the
total number of carries when $n$-place numbers are added, we
find that
$$\textstyle\quad G(z, t) = (b - zt)/p(z, t),\quad\hbox{where }p(z, t) = b
- {1\over 2}(1 + b)(1 + z)t + zt↑2.\eqno (5)$$
Note that $G(1, t) = 1/(1 - t)$, and this checks
with the fact that $G↓n(1)$ must equal 1 (it is the sum of all
the possible probabilities). Taking partial derivatives of (5)
with respect to $z$, we find that
\def\\{\hskip-1.5pt}
$$\eqalign{{∂G\over ∂z} ⊗= \sum ↓{n}G↑{\prime}↓{\\n}(z)t↑n
= {-t\over p(z, t)} + {t(b - zt)(b+1-2t)\over 2p(z,t)↑2};\cr
{∂↑2G\over∂z↑2}⊗=\sum↓n G↑{\prime\prime}↓{\\n}(z)t↑n={-t↑2(b+1-2t)\over p(z,t)↑2}+
{t↑2(b-zt)(b+1-2t)\over p(z,t)↑3}.\cr}$$
Now let us put $z = 1$ and expand in
partial fractions:
$$\eqalign{\sum ↓{n}G↑{\prime}↓{\\n}(1)t↑n⊗ = {t\over 2} \left({1\over
(1 - t)↑2} - {1\over (b - 1)(1 - t)} + {1\over (b - 1)(b - t)}\right)\,,\cr
\sum ↓{n}G↑{\prime\prime}↓{\\n}(1)t↑n ⊗= {t↑2\over 2} \left({1\over
(1 - t)↑3} - {1\over (b - 1)↑2(1 - t)} + {1\over (b - 1)↑2(b
- t)}\right.\cr
\noalign{\vskip-2pt}
⊗\hskip 65mm\left.\null + {1\over (b - 1)(b - t)↑2}\right)\,.\cr}$$
It follows that the average number of carries,
i.e., the mean value of $K$, is
$$G↑{\prime}↓{\\n}(1) = {1\over 2}\biggglp n
- {1\over b - 1} \bigglp 1 - \left(1\over b\right)↑n\,\biggrp\bigggrp\,
;\eqno (6)$$
the variance is
$$\vbox{\halign{\hbox to size{$\dispstyle#$}\cr
\9G↑{\prime\prime}↓{\\n}(1) + G↑{\prime}↓{\\n}(1) - G↑{\prime}↓{\\n}(1)↑2\hfill\cr
\noalign{\vskip 3pt}
\hfill = {1\over 4}\biggglp n + {2n\over
b - 1} - {2b + 1\over (b - 1)↑2} + {2b + 2\over (b - 1)↑2}
\left(1\over b\right)↑n - {1\over (b - 1)↑2} \left(1\over b\right)↑{2n}\bigggrp
\,.\hfill(7)\cr}}$$
So the number of carries is just slightly less
than ${1\over 2}n$ under these assumptions.
% Vicinity of page 263
% folio 357 galley 5 *
\subsectionbegin{History and bibliography}
The early history of the classical algorithms described in this
section is left as an interesting project for the reader, and
only the history of their implementation on computers will be
traced here.
The use of $10↑n$ as an assumed radix when multiplying large
numbers on a desk calculator was discussed by D. N. \α{Lehmer} and
J. P. \α{Ballantine,} {\sl AMM \bf 30} (1923), 67--69.
Double-precision arithmetic on digital computers was first treated by
J. \α{von Neumann} and H. H. \α{Goldstine} in their introductory
notes on programming, originally published in 1947 [J. von Neumann, {\sl Collected
Works \bf 5}, 142--151]. Theorems A and↔B above are due to D.
A. \α{Pope} and M. L. \α{Stein} [{\sl CACM \bf 3} (1960), 652--654],
whose paper also contains a bibliography of earlier work on
double-precision routines. Other ways of choosing the trial
quotient $\A q$ have been discussed by A. G. \α{Cox} and H.
A. \α{Luther,} {\sl CACM \bf 4} (1961), 353 [divide by $v↓1 + 1$
instead of $v↓1$], and by M. L. \α{Stein,} {\sl CACM \bf 7} (1964),
472--474 [divide by $v↓1$ or $v↓1 + 1$ according to the magnitude
of $v↓2$]; E. V. \α{Krishnamurthy} [{\sl CACM \bf 8} (1965), 179--181]
showed that examination of the single-precision remainder in
the latter method leads to an improvement over Theorem↔B.\xskip Krishnamurthy
and \α{Nandi} [{\sl CACM \bf 10} (1967), 809--813] suggested a way
to replace the normalization and unnormalization operations of Algorithm↔D
by a calculation of $\A q$ based on several leading digits
of the operands. G. E. \α{Collins} and D. R. \α{Musser} have carried out
an interesting analysis of the original Pope and Stein algorithm
[{\sl Inf.\ Proc.\ Letters \bf6} (1977), 151--155].
Several alternative methods for division have also been suggested:
\yskip\textindent{1)}``\α{Fourier division}'' [J. \α{Fourier,} {\sl Analyse des
\'equations d\'etermin\'ees} (Paris, 1831), $\section2.21$].\xskip
This method, which was often used on desk calculators,
essentially obtains each new quotient digit by increasing the
precision of the divisor and the dividend at each step. Some
rather extensive tests by the author have indicated that such a
method is inferior to the divide-and-correct technique
above, but there may be some applications in which Fourier division
is practical. See D. H. \α{Lehmer,} {\sl AMM \bf 33} (1926), 198--206;
J. V. \α{Uspensky,} {\sl Theory of Equations} (New York: McGraw-Hill,
1948), 159--164.
\yskip\textindent{2)}``\α{Newton's method}'' for evaluating the \α{reciprocal} of a number
was extensively used in early computers when there was no single-precision
division instruction. The idea is to find some initial approximation
$x↓0$ to the number $1/v$, then to let $x↓{n+1} = 2x↓n - vx↑{2}↓{n}$.
This method converges rapidly to $1/v$, since $x↓n = (1 - ε)/v$
implies that $x↓{n+1} = (1 - ε↑2)/v$. Convergence to third order,
i.e., with $ε$ replaced by $O(ε↑3)$ at each step, can be obtained
using the formula
$$\eqalign{x↓{n+1} ⊗= x↓n + x↓n(1 - vx↓n) + x↓n(1 - vx↓n)↑2\cr
⊗ = x↓n\,\biglp 1 + (1 - vx↓n)(1 + (1 - vx↓n))\bigrp ,\cr}$$
and similar formulas hold for fourth-order convergence,
etc.; see P. \α{Rabinowitz}, {\sl CACM \bf 4} (1961),
98. For calculations on extremely large numbers, Newton's second-order
method and subsequent multiplication by $u$ can actually be considerably
faster than Algorithm↔D\null, if we increase the precision of $x↓n$
at each step and if we also use the fast multiplication routines
of Section 4.3.3.\xskip (See Algorithm 4.3.3D for details.)\xskip Some related
iterative schemes have been discussed by E. V. \α{Krishnamurthy},
{\sl IEEE Trans.\ \bf C--19} (1970), 227--231.
\yskip\textindent{3)}Division methods have also been based on the evaluation of
$${u\over v + ε} = {u\over v}\left(1- \left(ε\over
v\right) + \left(ε\over v\right)↑2 - \left(ε\over v\right)↑3 + \cdots\right)\,.$$
See H. H. \α{Laughlin,} {\sl AMM \bf 37} (1930), 287--293.
We have used this idea in the double-precision case (Eq.\ 4.2.3--3).
\yskip Besides the references
just cited, the following early articles concerning multiple-precision
arithmetic are of interest: High-precision routines for floating point
calculations using \α{ones' complement} arithmetic are described by A. H. \α{Stroud}
and D. \α{Secrest,} {\sl Comp.\ J. \bf 6} (1963), 62--66. Extended-precision
subroutines for use in {\:b FORTRAN} programs are described by B.
I. \α{Blum,} {\sl CACM \bf 8} (1965), 318--320; and for use in {\:b ALGOL}
by M. \α{Tienari} and V. \α{Suokonautio,} {\sl BIT \bf 6} (1966), 332--338.
Arithmetic on integers with {\sl unlimited} precision, making
use of linked memory allocation techniques, has been elegantly
described by G. E. \α{Collins,} {\sl CACM \bf 9} (1966), 578--589.
For a much larger repertoire of operations, including logarithms
and trigonometric functions, see R. P. \α{Brent,} {\sl ACM Trans.\
Math.\ Software \bf4} (1978), 57--81.
We have restricted our discussion in this section to arithmetic
techniques for use in computer programming. There are many algorithms
for {\sl \β{hardware}} implementation of arithmetic operations that
are very interesting, but they appear to be inapplicable to
computer programs for high-precision numbers; for example, see
G. W. \α{Reitwiesner,} ``Binary Arithmetic,'' {\sl Advances in Com\-puters
\bf 1} (New York: Academic Press, 1960), 231--308; O. L. \α{MacSorley},
{\sl Proc.\ IRE \bf 49} (1961), 67--91; G. \α{Metze,} {\sl IRE Trans.\
\bf EC--11} (1962), 761--764; H. L. \α{Garner,} ``Number Systems
and Arithmetic,'' {\sl Advances in Computers \bf 6} (New York:
Academic Press, 1965), 131--194. The minimum achievable execution
time for hardware addition and multiplication operations has
been investigated by S. \α{Winograd,} {\sl JACM \bf 12} (1965),
277--285, {\bf 14} (1967), 793--802; by R. P. \α{Brent},
{\sl IEEE Trans.\ \bf C--19} (1970), 758--759;
and by R. W. \α{Floyd,} {\sl
IEEE Symp.\ Found.\ Comp.\ Sci.\ \bf 16} (1975), 3--5.
\exbegin{EXERCISES}
\exno 1. [42] Study
the early history of the classical algorithms for arithmetic
by looking up the writings of, say, \α{Sun Ts\u u}, \α{al-Khw\A
arizm\A\i}, \α{al-Uql\A\i dis\A\i},
\α{Fibonacci}, and Robert \α{Recorde}, and by translating
their methods as faithfully as possible into precise algorithmic
notation.
\exno 2. [15] Generalize Algorithm↔A so that it does ``column
addition,'' i.e., obtains the sum of $m$ nonnegative $n$-place
integers.\xskip (Assume that $m ≤ b$.)
\exno 3. [21] Write a \MIX\ program
for the algorithm of exercise↔2, and estimate its running time
as a function of $m$ and $n$.
\exno 4. [M21] Give a formal \β{proof}
of the validity of Algorithm↔A\null, using the method of ``\β{inductive
assertions}'' explained in Section 1.2.1.
\exno 5. [21] Algorithm A adds the two inputs by going from
right to left, but sometimes the data is more readily accessible
from left to right. Design an algorithm that produces the same
answer as Algorithm↔A\null, but that generates the digits of the
answer from left to right, going back to change previous
values if a carry occurs to make a previous value incorrect.\xskip
[{\sl Note:} Early \α{Hindu} and \α{Arabic} manuscripts dealt with
addition from left to right in this way, probably because it was customary
to work from left to right on an abacus; the right-to-left
addition algorithm was a refinement due to \α{al-Uql\A\i dis\A\i},
perhaps because Arabic is written from right to left.]
\trexno 6. [22] Design an algorithm that adds from left to right
(as in exercise↔5), but your algorithm should not store a digit of the
answer until this digit cannot possibly be affected by future
carries; there is to be no changing of any answer digit once
it has been stored.\xskip [{\sl Hint:} Keep track of the number
of consecutive $(b - 1)$'s that have not yet been stored in
the answer.]\xskip This sort of algorithm would be appropriate, for
example, in a situation where the input and output numbers are
to be read and written from left to right on magnetic tapes, or
\inx{linked memory}
if they appear in straight \α{linear lists}.
\exno 7. [M26] Determine the average number of times the algorithm
of exercise↔5 will find that a \α{carry} makes it necessary to go
back and change $k$ digits of the partial answer, for $k = 1$, 2,
$\ldotss$, $n$.\xskip (Assume that both inputs are independently and
uniformly distributed between 0 and $b↑n - 1$.)
\exno 8. [M26] Write a \MIX\ program
for the algorithm of exercise↔5, and determine its average running
time based on the expected number of carries as computed in
the text.
\trexno 9. [21] Generalize Algorithm↔A to obtain an algorithm
that adds two $n$-place numbers in a {\sl \α{mixed-radix}} number
\inx{addition, mixed-radix}
system, with bases $b↓0$, $b↓1$, $\ldots$ (from right to left).
Thus the least significant digits lie between 0 and $b↓0 - 1$,
the next digits lie between 0 and $b↓1 - 1$, etc.; cf.\ Eq.\ 4.1--9.
% Vicinity of page 266
% folio 360 galley 6a *
\exno 10. [18] Would
Program S work properly if the instructions on lines 06 and
07 were interchanged? If the instructions on lines 05 and 06
were interchanged?
\exno 11. [10] Design an algorithm that compares two nonnegative
$n$-place integers $u = (u↓1u↓2 \ldotsm u↓n)↓b$ and $v = (v↓1v↓2 \ldotsm
v↓n)↓b$, to determine whether $u < v$, $u = v$, or
$u > v$.
\exno 12. [16] Algorithm S assumes that we know which of the
two input operands is the larger; if this information is not
known, we could go ahead and perform the subtraction anyway,
and we would find that an extra ``\α{borrow}'' is still present
\inx{subtraction}
at the end of the algorithm. Design another algorithm that
could be used (if there is a ``borrow'' present at the end of
Algorithm↔S) to complement $(w↓1w↓2 \ldotsm w↓n)↓b$ and therefore
to obtain the absolute value of the difference of $u$ and $v$.
\exno 13. [21] Write a \MIX\ program that multiplies $(u↓1u↓2
\ldotsm u↓n)↓b$ by $v$, where $v$ is a single-precision number
(i.e., $0 ≤ v < b)$, producing the answer $(w↓0w↓1 \ldotsm w↓n)↓b$.
How much running time is required?
\trexno 14. [M24] Give a formal proof of the validity of Algorithm↔M\null,
using the method of ``inductive assertions'' explained
in Section 1.2.1.
\exno 15. [M20] If we wish to form the product of two $n$-place
fractions, $(.u↓1u↓2 \ldotsm u↓n)↓b \times (.v↓1v↓2 \ldotsm v↓n)↓b$,
and to obtain only an $n$-place approximation $(.w↓1w↓2 \ldotsm
w↓n)↓b$ to the result, Algorithm↔M could be used to obtain a
\inx{multiplication of fractions}
$2n$-place answer that is subsequently rounded to the desired approximation.
But this involves about twice as much work as is necessary for
reasonable accuracy, since the products $u↓iv↓j$ for $i + j
> n + 2$ contribute very little to the answer.
Give an estimate of the maximum error that can
occur, if these products $u↓iv↓j$ for $i + j > n + 2$ are not
computed during the multiplication, but are assumed to be zero.
\trexno 16. [20] Design an algorithm
\inxf{division}
that divides a nonnegative $n$-place integer $(u↓1u↓2 \ldotsm
u↓n)↓b$ by $v$, where $v$ is a single-precision number (i.e., $0
< v < b$), producing the quotient $(w↓1w↓2 \ldotsm w↓n)↓b$ and remainder↔$r$.
\exno 17. [M20] In the notation of Fig.\ 6, assume that $v↓1
≥ \lfloor b/2\rfloor $; show that if $u↓0 = v↓1$, we must have
$q = b - 1$ or $b - 2$.
\exno 18. [M20] In the notation of Fig.\ 6, show that if $q↑\prime
= \lfloor (u↓0b + u↓1)/(v↓1 + 1)\rfloor $, then $q↑\prime ≤
q$.
\trexno 19. [M21] In the notation of Fig.\ 6, let $\A q$ be
an approximation to $q$, and let $\A r = u↓0b + u↓1 - \A q
v↓1$. Assume that $v↓1 > 0$. Show that if $v↓2\A q >
b\A r + u↓2$, then $q < \A q$.\xskip [{\sl Hint:} Strengthen
the proof of Theorem↔A by examining the influence of $v↓2$.]
\exno 20. [M22] Using the notation and assumptions of
exercise↔19, show that if $v↓2\A q ≤ b\A r + u↓2$, then $\A q
= q$ or $q = \A q - 1$.
\trexno 21. [M23] Show that if $v↓1 ≥ \lfloor b/2\rfloor $, and
if $v↓2\A q ≤ b\A r + u↓2$ but $\A q ≠ q$ in
the notation of exercises 19 and↔20, then $u \mod v ≥ (1 - 2/b)v$.\xskip
(The latter event occurs with approximate probability $2/b$,
so that when $b$ is the word size of a computer we must have
$q↓j = \A q$ in Algorithm↔D except in very rare circumstances.)
\trexno 22. [24] Find an example of a four-digit number divided
by a three-digit number for which step D6 is necessary in Algorithm↔D\null,
when the radix $b$ is 10.
\exno 23. [M23] Given that $v$ and $b$ are integers, and that
$1 ≤ v < b$, prove that we always have $\lfloor b/2\rfloor ≤ v\lfloor b/(v
+ 1)\rfloor < (v + 1)\lfloor b/(v + 1)\rfloor ≤ b$.
\exno 24. [M20] Using the law of the distribution of leading
digits explained in Section 4.2.4, give an approximate formula
for the probability that $d = 1$ in Algorithm↔D.\xskip (When $d =
1$, it is, of course, possible to omit most of the calculation
in steps D1 and↔D8.)
\exno 25. [26] Write a \MIX\ routine for step D1, which is needed
to complete Program↔D.
\exno 26. [21] Write a \MIX\ routine for step D8, which is needed
to complete Program↔D.
\exno 27. [M20] Prove that at the beginning of step D8 in
Algorithm↔D\null, the unnormalized remainder
$(u↓{m+1}u↓{m+2} \ldotsm u↓{m+n})↓b$ is always an exact
multiple of $d$.
\exno 28. [M30] (A. \α{Svoboda,} {\sl Stroje na Zpracov\'an\'\i\
Informac\'\i\ \bf 9} (1963), 25--32.)\xskip Let $v = (v↓1v↓2
\ldotsm v↓n)↓b$ be any radix $b$ integer, where $v↓1 ≠ 0$. Perform
the following operations:
\yskip\hang\textindent{\bf N1.}If $v↓1 < b/2$, multiply $v$ by $\lfloor
(b + 1)/(v↓1 + 1)\rfloor$. Let the result of this step be $(v↓0v↓1v↓2
\ldotsm v↓n)↓b$.
\yskip\hang\textindent{\bf N2.}If $v↓0 = 0$,
set $v ← v + (1/b)\lfloor b(b - v↓1)/(v↓1
+ 1)\rfloor v$; let the result of this step be $(v↓0v↓1v↓2 \ldotsm
v↓n.v↓{n+1}\ldotsm)↓b$. Repeat step N2 until $v↓0 ≠ 0$.
\yskip\noindent Prove that step N2 will be performed
at most three times, and that we must always have $v↓0 = 1$,
$v↓1 = 0$ at the end of the calculations.
[{\sl Note:} If $u$ and $v$ are both multiplied
by the above constants, we do not change the value of the quotient
$u/v$, and the divisor has been converted into the form $(10v↓2
\ldotsm v↓n.v↓{n+1}v↓{n+2}v↓{n+3})↓b$. This form of the divisor
is very convenient because, in the notation of
Algorithm↔D\null, we may simply take $\A q = u↓j$ as a trial divisor at
the beginning of step D3, or $\A q = b - 1$ when $(u↓{j-1},
u↓j) = (1, 0)$.]
\exno 29. [15] Prove or disprove: At the beginning of step D7
of Algorithm↔D\null, we always have $u↓j = 0$.
\trexno 30. [22] If memory space is limited, it may be desirable
to use the same storage locations for both input and output
during the performance of some of the algorithms in this section.
Is it possible to have $w↓1$, $\ldotss$, $w↓n$ stored in the same
respective locations as $u↓1$, $\ldotss$, $u↓n$ or $v↓1$, $\ldotss$,
$v↓n$ during Algorithm A or↔S\null? Is it possible to have the quotient
$q↓0$, $\ldotss$,
$q↓m$ occupy the same locations as $u↓0$, $\ldotss$, $u↓m$ in
Algorithm↔D\null? Is there any permissible overlap of memory locations between
input and output in Algorithm↔M?
\exno 31. [28] Assume that $b = 3$ and that $u = (u↓1 \ldotsm
u↓{m+n})↓3$, $v = (v↓1 \ldotsm v↓n)↓3$ are integers in {\sl\α{balanced
ternary}} notation (cf.\ Section 4.1), $v↓1 ≠ 0$. Design a long-division
\inx{division, balanced ternary}
algorithm that divides $u$ by $v$, obtaining a remainder whose
absolute value does not exceed ${1\over 2}|v|$. Try to find
an algorithm that would be efficient if incorporated into the
arithmetic circuitry of a balanced ternary computer.
\exno 32. [M40] Assume that $b = 2i$ and that $u$ and $v$ are
complex numbers expressed in the \α{quater-imaginary number system}.
\inx{division, quater-imaginary}
Design algorithms that divide $u$ by $v$, perhaps obtaining
a suitable remainder of some sort, and compare their efficiency.\xskip
[{\sl References:} M. \α{Nadler,} {\sl CACM \bf 4} (1961),
192--193; Z. \α{Pawlak} and A. \α{Wakulicz,} {\sl Bull.\ de l'Acad.\ Polonaise
des Sciences}, Classe III, {\bf 5} (1957), 233--236 (see also
pp. 803--804); and exercise 4.1--15.]
\exno 33. [M40] Design an algorithm for taking \α{square roots},
analogous to Algorithm↔D and to the traditional pencil-and-paper method
for extracting square roots.
\exno 34. [40] Develop a set of computer subroutines for doing
the four arithmetic operations on arbitrary integers, putting
no constraint on the size of the integers except for the implicit
assumption that the total memory capacity of the computer should
not be exceeded.\xskip (Use \α{linked memory} allocation, so that no time
is wasted in finding room to put the results.)
\exno 35. [40] Develop a set of computer subroutines for ``\α{decuple-precision}
\α{floating point}'' arithmetic, using excess 0, base $b$, nine-place
floating point number representation, where $b$ is the computer
word size, and allowing a full word for the exponent.\xskip (Thus
each floating point number is represented in 10 words of memory,
and all scaling is done by moving full words instead of by shifting
within the words.)
\exno 36. [M42] Compute the values of the fundamental constants
listed in Appendix B to much higher precision than the 40-place
values listed there.\xskip [{\sl Note:} The first 100,000 digits
\inx{pi}
of the decimal expansion of $π$ were published by D. \α{Shanks}
and J. W. \α{Wrench,} Jr., in {\sl Math.\ Comp.\ \bf 16} (1962), 76--99. One
million digits of $π$ were computed by Jean \α{Guilloud} and
Martine \α{Bouyer} of the French Atomic Energy Commission in 1974.]
\trexno 37. [20] (E. \α{Salamin}.)\xskip Explain how to avoid the
normalization and unnormalization steps of Algorithm↔D\null, when $d$ is a
power of 2 on a binary computer, without changing the sequence of trial
quotient digits computed by that algorithm.\xskip (How can $\A q$ be computed
in step D3 if the normalization of step D1 hasn't been done?)
% Vicinity of page 268
% folio 362 galley 6b *
\runningrighthead{MODULAR ARITHMETIC}
\section{4.3.2}
\sectionskip
\sectionbegin{\star4.3.2. \β{Modular Arithmetic}}
Another interesting alternative is available
for doing arithmetic on large integer numbers, based on some
simple principles of number theory. The idea is to have several
``moduli'' $m↓1$, $m↓2$, $\ldotss$, $m↓r$ that contain no common
factors, and to work indirectly with ``residues'' $u \mod m↓1$,
$u \mod m↓2$, $\ldotss$, $u \mod m↓r$ instead of directly with the
number $u$.
For convenience in notation throughout this section, let
$$u↓1 = u \mod m↓1,\qquad u↓2 = u \mod m↓2,\qquad \ldotss
,\qquad u↓r = u \mod m↓r.\eqno (1)$$
It is easy to compute $(u↓1, u↓2, \ldotss , u↓r)$
from an integer number $u$ by means of division; and it is important to note
that no information is lost in this process, since we
can recompute $u$ from $(u↓1, u↓2, \ldotss , u↓r)$ provided
that $u$ is not too large. For example, if $0 ≤ u <
v ≤ 1000$, it is impossible to have $(u \mod 7, u \mod 11,
u \mod 13)$ equal to $(v \mod 7, v \mod 11, v \mod 13)$. This
is a consequence of the ``\β{Chinese remainder theorem}'' stated
below.
We may therefore regard $(u↓1, u↓2, \ldotss , u↓r)$ as a new
type of internal computer representation, a ``modular representation,''
of the integer $u$.
The advantages of a modular representation
are that addition, subtraction, and multiplication are very
simple:
$$\baselineskip 15pt
\eqalignno{(u↓1, \ldotss , u↓r) + (v↓1, \ldotss, v↓r)⊗= \biglp (u↓1 + v↓1)\mod
m↓1, \ldotss , (u↓r + v↓r)\mod m↓r\bigrp ,⊗(2)\cr
(u↓1, \ldotss , u↓r) - (v↓1, \ldotss , v↓r)⊗= \biglp
(u↓1 - v↓1)\mod m↓1, \ldotss , (u↓r - v↓r)\mod m↓r\bigrp
,⊗(3)\cr
(u↓1, \ldotss , u↓r) \times (v↓1, \ldotss , v↓r)
⊗= \biglp (u↓1 \times v↓1)\mod m↓1, \ldotss , (u↓r \times v↓r)\mod
m↓r\bigrp .⊗(4)\cr}$$
To derive (4), for example, we need to show that
$$uv \mod m↓j = (u \mod m↓j)(v
\mod m↓j)\mod m↓j$$ for each modulus $m↓j$. But this is a basic
fact of elementary number theory: $x \mod m↓j = y \mod m↓j$
if and only if $x ≡ y\modulo {m↓j}$; furthermore if $x ≡ x↑\prime$
and $y ≡ y↑\prime $, then $xy ≡ x↑\prime y↑\prime \modulo {m↓j}$;
hence $(u \mod m↓j)(v \mod m↓j)≡uv\modulo{m↓j}$.
% Vicinity of page 269
% folio 363 galley 7 *
The disadvantages of a
modular representation are that it is comparatively difficult
to test whether a number is positive or negative or to test
whether or not $(u↓1, \ldotss , u↓r)$ is greater than $(v↓1,
\ldotss,v↓r)$. It is also difficult to test whether or not overflow
has occurred as the result of an addition, subtraction, or multiplication,
and it is even more difficult to perform division. When these
operations are required frequently in conjunction with addition,
subtraction, and multiplication, the use of modular arithmetic
can be justified only if fast means of conversion into and out
of the modular representation are available. Therefore conversion
between modular and positional notation is one of the principal
topics of interest to us in this section.
The processes of addition, subtraction, and multiplication using
(2), (3), and (4) are called residue arithmetic or {\sl modular
arithmetic.} The range of numbers that can be handled by modular
arithmetic is equal to $m = m↓1m↓2 \ldotsm m↓r$, the product
of the moduli. Therefore we see that the amount of time required
to add, subtract, or multiply $n$-digit numbers using modular
arithmetic is essentially proportional to $n$ (not counting
the time to convert in and out of modular representation). This
is no advantage at all when addition and subtraction are considered,
but it can be a considerable advantage with respect to multiplication
since the conventional method of the preceding section requires
an execution time proportional to $n↑2$.
Moreover, on a computer that allows many
operations to take place simultaneously, modular arithmetic
can be a significant advantage even for addition and subtraction;
the operations with respect to different moduli can all be done
at the same time, so we obtain a substantial increase in speed.
The same kind of decrease in execution time could not be achieved
by the conventional techniques discussed in the previous section,
since carry propagation must be considered. Perhaps some day
highly \α{parallel computers} will make simultaneous operations
commonplace, so that modular arithmetic will be of significant
importance in ``\α{real-time}'' calculations when a quick answer
to a single problem requiring high precision is needed.\xskip (With
highly parallel computers, it is often preferable to run $k$ {\sl
separate} programs simultaneously, instead of running a {\sl
single} program $k$ times as fast, since the latter alternative
is more complicated but does not utilize the machine any more
efficiently. ``Real-time'' calculations are exceptions that
make the inherent parallelism of modular arithmetic more significant.)
Now let us examine the basic fact that underlies the modular
representation of numbers:
\algbegin Theorem C (Chinese Remainder Theorem). {\sl Let
$m↓1$, $m↓2$, $\ldotss$, $m↓r$ be positive integers that are relatively
prime in pairs, i.e.,}
$$\gcd(m↓j, m↓k) = 1\qquad\hbox{when }j ≠ k.\eqno (5)$$
{\sl Let $m = m↓1m↓2 \ldotsm m↓r$, and let $a$, $u↓1$, $u↓2$,
$\ldotss$, $u↓r$ be integers. Then there is exactly one integer
$u$ that satisfies the conditions}
$$a ≤ u < a + m,\qquad\hbox{and}\qquad u ≡ u↓j\modulo{m↓j}\hbox{ for }
1 ≤ j ≤ r.\eqno (6)$$
\dproofbegin If $u ≡ v \modulo{m↓j}$ for $1
≤ j ≤ r$, then $u - v$ is a multiple of $m↓j$ for all $j$, so
(5) implies that $u - v$ is a multiple of $m = m↓1m↓2 \ldotsm
m↓r$. This argument shows that there is {\sl at most} one solution
of (6). To complete the proof we must now show the existence
of {\sl at least} one solution, and this can be done in two
simple ways:
\yyskip\noindent METHOD 1 (``Nonconstructive'' proof).\xskip As $u$ runs through
the $m$ distinct values $a ≤ u < a + m$, the $r$-tuples $(u
\mod m↓1, \ldotss , u \mod m↓r)$ must also run through $m$ distinct
values, since (6) has at most one solution. But there are exactly
$m↓1m↓2 \ldotsm m↓r$ possible $r$-tuples $(v↓1, \ldotss , v↓r)$
such that $0 ≤ v↓j < m↓j$. Therefore each $r$-tuple must occur
exactly once, and there must be some value of $u$ for which
$(u \mod m↓1, \ldotss , u \mod m↓r) = (u↓1, \ldotss , u↓r)$.
\yyskip\noindent METHOD 2 (``Constructive'' proof).\xskip We
can find numbers $M↓j$ for $1 ≤ j ≤ r$ such that
$$M↓j ≡ 1 \modulo{m↓j}\qquad\hbox{and}\qquad M↓j≡ 0\modulo {m↓k}\hbox{ for }
k ≠ j.\eqno (7)$$
This follows because (5) implies that $m↓j$ and
$m/m↓j$ are relatively prime, so we may take
$$M↓j = (m/m↓j)↑{\varphi(m↓j)}\eqno (8)$$
by \α{Euler's theorem} (exercise 1.2.4--28). Now the
number
$$u = a + \biglp (u↓1M↓1 + u↓2M↓2 + \cdots + u↓rM↓r - a)\mod
m\bigrp \eqno (9)$$
satisfies all the conditions of (6).\quad\blackslug
\yyskip A very special case of this theorem
was stated by the Chinese mathematician \α{Sun} Ts\u u, who
gave a rule called t\'ai-yen (``great generalization'').
The date of his writing is very uncertain; it is thought to
be between 280 and 473 {\:m A.D.}\xskip [See Joseph \α{Needham},
{\sl Science and Civilization in China \bf 3} (Cambridge University
Press, 1959), 33--34, 119--120, for an interesting discussion.]\xskip Theorem↔C
was apparently first stated and proved in its proper generality
by \α{Chhin} Chiu↔Shao in his {\sl Shu Shu Chiu Chang} (1247). Numerous
early contributions to this theory have been summarized by L.
E. \α{Dickson} in his {\sl History of the Theory of Numbers \bf 2}
(New York: Chelsea, 1952), 57--64.
As a consequence of Theorem C, we may use modular representation
for numbers in any consecutive interval of $m = m↓1m↓2 \ldotsm
m↓r$ integers. For example, we could take $a = 0$ in (6), and
work only with nonnegative integers $u$ less than↔$m$. On the
other hand, when addition and subtraction are being done, as
well as multiplication, it is usually most convenient to assume
that all the moduli $m↓1$, $m↓2$, $\ldotss$, $m↓r$ are odd numbers,
so that $m = m↓1m↓2 \ldotsm m↓r$ is odd, and to work with integers
in the range
$$- {m\over 2} < u < {m\over 2},\eqno (10)$$
which is completely symmetrical about zero.
To perform the basic operations listed in (2), (3), and (4),
we need to compute $(u↓j + v↓j)\mod m↓j$, $(u↓j - v↓j)\mod m↓j$,
and $u↓jv↓j \mod m↓j$, when $0 ≤ u↓j, v↓j < m↓j$. If $m↓j$
is a single-precision number, it is most convenient to form
$u↓jv↓j \mod m↓j$ by doing a multiplication and then a division
operation. For addition and subtraction, the situation is a
\inx{addition, mod m}
\inx{subtraction, mod m}
little simpler, since no division is necessary; the following
formulas may conveniently be used:
$$\eqalignno{(u↓j+v↓j)\mod m↓j⊗=\left\{\vcenter{\halign{$#,$\hfill\qquad
⊗if $#$\hfill\cr
u↓j+v↓j⊗u↓j+v↓j<m↓j;\cr
u↓j+v↓j-m↓j⊗u↓j+v↓j≥m↓j.\cr}}\right.⊗(11)\cr
\noalign{\vskip 4pt}
(u↓j-v↓j)\mod m↓j⊗=\left\{\vcenter{\halign{$#,$\hfill\qquad
⊗if $#$\hfill\cr
u↓j-v↓j⊗u↓j-v↓j≥0;\cr
u↓j-v↓j+m↓j⊗u↓j-v↓j<0.\cr}}\right.⊗(12)\cr}$$
(Cf.\ Section 3.2.1.1.)\xskip In this case, since we
want $m$ to be as large as possible, it is easiest to let $m↓1$
be the largest odd number that fits in a computer word, to let
$m↓2$ be the largest odd number $< m↓1$ that is relatively prime
to↔$m↓1$, to let $m↓3$ be the largest odd number $< m↓2$ that
is relatively prime to both $m↓1$ and↔$m↓2$, and so on until
enough $m↓j$'s have been found to give the desired range↔$m$.
Efficient ways to determine whether or not two integers are
relatively prime are discussed in Section 4.5.2.
As a simple example, suppose that we have a decimal computer
whose words hold only two digits, so that the
word size is 100. Then the procedure described in
the previous paragraph would give
$$m↓1 = 99,\quad m↓2 = 97,\quad m↓3 = 95,\quad m↓4 = 91,\quad
m↓5 = 89,\quad m↓6 = 83,\eqno (13)$$
and so on.
\penalty-100 % break here rather than before that trailer line.
% Vicinity of page 271
% folio 366 galley 8 *
On binary computers it is sometimes desirable to choose the
$m↓j$ in a different way, by selecting
$$m↓j = 2↑{e↓j}- 1.\eqno (14)$$
In other words, each modulus is one less than a
power of 2. Such a choice of $m↓j$ often makes the basic arithmetic
operations simpler, because it is relatively easy to work modulo
$2↑{e↓j} - 1$, as in \α{ones' complement} arithmetic. When
the moduli are chosen according to this strategy, it is helpful
to relax the condition $0 ≤ u↓j < m↓j$ slightly, so that we
require only
$$0 ≤ u↓j < 2↑{e↓j},\qquad u↓j ≡ u\modulo{2↑{e↓j}-1}.\eqno(15)$$
Thus, the value $u↓j = m↓j = 2↑{e↓j}-1$
is allowed as an optional alternative to $u↓j = 0$, since
this does not affect the validity of Theorem↔C\null, and it means
we are allowing $u↓j$ to be any $e↓{j\,}$-bit binary number. Under
this assumption, the operations of addition and multiplication
modulo $m↓j$ become the following:
$$\eqalignno{u↓j \oplus v↓j ⊗ = \left\{\vcenter{\halign{$#,$\hfill\qquad
⊗if $#$\hfill\cr
u↓j+v↓j⊗u↓j+v↓j<2↑{e↓j};\cr
\biglp(u↓j+v↓j)\mod 2↑{e↓j}\bigrp+1⊗u↓j+v↓j≥2↑{e↓j}.\cr}}\right.⊗(16)\cr
\noalign{\vskip 6pt}
u↓j \otimes v↓j ⊗= (u↓jv↓j \mod
2↑{e↓j})\;\oplus\;\lfloor u↓jv↓j/2↑{e↓j}\rfloor .⊗(17)\cr}$$
$\biglp$Here $\oplus$ and $\otimes$ refer to the operations
done on the individual components of $(u↓1, \ldotss, u↓r)$
and $(v↓1, \ldotss , v↓r)$ when adding or multiplying, respectively,
using the convention (15).$\bigrp$ Equation (12) may be used for subtraction.
These operations can be performed efficiently even when
$m↓j$ is larger than the computer's word size, since it is a simple
matter to compute the remainder of a positive number modulo
a power of 2, or to divide a number by a power of 2. In (17)
we have the sum of the ``upper half'' and the ``lower half''
of the product, as discussed in exercise 3.2.1.1--8.
If moduli of the form $2↑{e↓j} - 1$ are to be used, we
must know under what conditions the number $2↑e - 1$ is relatively
prime to the number $2↑f - 1$. Fortunately, there is a very
simple rule,
$$\gcd(2↑e - 1, 2↑f - 1) = 2↑{\gcd(e,f)} - 1,\eqno (18)$$
which states in particular that {\sl $2↑e - 1$ and $2↑f
- 1$ are relatively prime if and only if $e$ and $f$ are relatively
prime}. Equation (18) follows from Euclid's algorithm and the
identity
$$(2↑e - 1)\mod(2↑f - 1) = 2↑{e\mod f} - 1.\eqno (19)$$
(See exercise↔6.)\xskip Thus we could choose for example
$m↓1 = 2↑{35} - 1$, $m↓2 = 2↑{34} - 1$, $m↓3 = 2↑{33} - 1$, $m↓4 =
2↑{31} - 1$, $m↓5 = 2↑{29} - 1$, if we had a computer with word
size $2↑{35}$; this would permit efficient addition, subtraction, and
multiplication of integers
in a range of size $m↓1m↓2m↓3m↓4m↓5 > 2↑{161}$.
\yskip As we have already observed, the operations of conversion to
and from modular representation are very important. If we are
\inx{conversion of representations}
given a number $u$, its modular representation $(u↓1, \ldotss
, u↓r)$ may be obtained by simply dividing $u$ by $m↓1$, $\ldotss$, $m↓r$
and saving the remainders. A possibly more attractive procedure,
if $u = (v↓mv↓{m-1} \ldotsm v↓0)↓b$, is to evaluate the polynomial
$$\biglp\ldotss(v↓mb + v↓{m-1})b +\cdots\bigrp\,b + v↓0$$
using modular arithmetic. When $b = 2$ and when
the modulus $m↓j$ has the special form $2↑{e↓j} - 1$, both
of these methods reduce to quite a simple procedure:
Consider the binary representation of $u$ with blocks
of $e↓j$ bits grouped together,
$$u = a↓tA↑t + a↓{t-1}A↑{t-1} + \cdots + a↓1A + a↓0,\eqno(20)$$
where $A = 2↑{e↓j}$ and $0 ≤ a↓k < 2↑{e↓j}$
for $0 ≤ k ≤ t$. Then
$$u ≡ a↓t + a↓{t-1} + \cdots + a↓1 + a↓0\modulo{2↑{e↓j} - 1},\eqno (21)$$
since $A ≡ 1$, so we obtain $u↓j$ by
adding the $e↓{j\,}$-bit numbers $a↓t \oplus \cdots \oplus a↓1 \oplus
a↓0$, using (16). This process
is similar to the familiar device of ``\α{casting out nines}'' that
determines $u \mod 9$ when $u$ is expressed in the
decimal system.
Conversion back from modular form to positional notation is
somewhat more difficult. It is interesting in this regard to
make a few side remarks about the way computers make us change
our viewpoint towards mathematical proofs: Theorem↔C tells us
\inx{proofs, constructive versus nonconstructive}
that the conversion from $(u↓1, \ldotss , u↓r)$ to $u$ is possible,
and two proofs are given. The first proof we considered is a
classical one that relies only on very simple concepts,
namely the facts that
\yskip\hang\textindent{i)}any number that is a multiple of $m↓1$, of
$m↓2$, $\ldotss $, and of $m↓r$, must be a multiple of $m↓1m↓2
\ldotsm m↓r$ when the $m↓j$'s are pairwise relatively prime;
and
\yskip\hang\textindent{ii)}if $m$ things are put into $m$ boxes with no two
things in the same box, then there must be one in each box.
\yskip\noindent By traditional notions of mathematical
aesthetics, this is no doubt the nicest proof of Theorem↔C;
but from a computational standpoint it is completely worthless.
It amounts to saying, ``Try $u = a$, $a + 1$, $\ldots$ until you
find a value for which $u ≡ u↓1\modulo{m↓1}$, $\ldotss$, $u ≡
u↓r\modulo{m↓r}$.''
The second proof of Theorem C is more explicit;
it shows how to compute $r$ new constants $M↓1$, $\ldotss$, $M↓r$,
and to get the solution in terms of these constants by formula
(9). This proof uses more complicated concepts (for example,
Euler's theorem), but it is much more satisfactory from a computational
standpoint, since the constants $M↓1$, $\ldotss$, $M↓r$ need to
be determined only once. On the other hand, the determination
of $M↓j$ by Eq.\ (8) is certainly not trivial, since the evaluation
of Euler's $\varphi $-function requires, in general, the factorization
\inx{Euler's totient function $\varphi(n)$}
of $m↓j$ into prime powers. Furthermore, $M↓j$ is likely to
be a terribly large number, even if we compute only the quantity
$M↓j \mod m$ $\biglp$which will work just as well as↔$M↓j$ in↔(9)$\bigrp$.
Since $M↓j \mod m$ is uniquely determined if (7) is to be satisfied
(because of the Chinese remainder theorem), we can see that,
in any event, Eq.\ (9) requires a lot of high-precision calculation,
and such calculation is just what we wished to avoid by modular
arithmetic in the first place.
So we need an even {\sl better\/} proof of Theorem↔C if we are
going to have a really usable method of conversion from $(u↓1,
\ldotss , u↓r)$ to $u$. Such a method was suggested by H. L.
\α{Garner} in 1958; it can be carried out using $r\choose2$ constants
$c↓{ij}$ for $1 ≤ i < j ≤ r$, where
$$c↓{ij}\,m↓i ≡ 1\quad\modulo {m↓j}.\eqno (22)$$
These constants $c↓{ij}$ are readily computed using
Euclid's algorithm, since for any given $i$ and $j$
Algorithm 4.5.2X will determine $a$ and $b$
such that $am↓i + bm↓j = \gcd(m↓i, m↓j) = 1$, and we may take
$c↓{ij} = a$. When the moduli have the special form $2↑{e↓j}
- 1$, a simple method of determining $c↓{ij}$ is given in
exercise↔6.
Once the $c↓{ij}$ have been determined satisfying
(22), we can set
$$\eqalign{v↓1⊗← u↓1 \mod m↓1,\cr
v↓2⊗← (u↓2 - v↓1)\,c↓{12} \mod m↓2,\cr
v↓3⊗← \biglp (u↓3 - v↓1)\,c↓{13} - v↓2\bigrp\,c↓{23}\mod m↓3,\cr
\vdots\cr
v↓r⊗← \biglp\ldotss ((u↓r - v↓1)\,c↓{1r}
- v↓2)\,c↓{2r} - \cdots - v↓{r-1}\bigrp\,c↓{(r-1)r} \mod m↓r.\cr}\eqno(23)$$
Then
$$u = v↓rm↓{r-1} \ldotsm m↓2m↓1 + \cdots + v↓3m↓2m↓1 + v↓2m↓1 +
v↓1\eqno (24)$$
is a number satisfying the conditions
$$0 ≤ u < m,\qquad u ≡ u↓j\modulo {m↓j}\quad\hbox{for }1≤j
≤ r.\eqno (25)$$
(See exercise 8; another way of rewriting (23)
that does not involve as many aux\-il\-iary constants is given
in exercise↔7.)\xskip Equation (24) is a {\sl mixed-radix representation}
of $u$, which may be converted to binary or decimal notation
using the methods of Section 4.4. If $0 ≤ u < m$ is not the
desired range, an appropriate multiple of $m$ can be added or
subtracted after the conversion process.
The advantage of the computation shown in (23) is that the calculation
of↔$v↓j$ can be done using only arithmetic mod $m↓j$, which
is already built into the modular arithmetic algorithms. Furthermore,
(23) allows parallel computation: We can start with $(v↓1, \ldotss
, v↓r) ← (u↓1 \mod m↓1, \ldotss , u↓r \mod m↓r)$, then at time
$j$ for $1 ≤ j < r$ we simultaneously set $v↓k ← (v↓k - v↓j)\,c↓{jk}
\mod m↓k$ for $j < k ≤ r$. An alternative way to compute the
mixed-radix representation, allowing similar possibilities for
parallelism, has been discussed by A. S. \α{Fraenkel,} {\sl Proc.\
ACM Nat.\ Conf.\ \bf 19} (Philadelphia, 1965), E1.4.
\def\\{\hskip-.8333pt}
It is important to observe that the mixed-radix representation
(24) is suffi\-cient to compare the magnitudes of two modular
numbers. For if we know that $0 ≤ u < m$ and $0 ≤ u↑\prime <
m$, then we can tell if $u < u↑\prime$ by first doing the conversion
to $(v↓1, \ldotss , v↓r)$ and $(v↑{\prime}↓{1}, \ldotss , v↑{\prime}↓{\\
r})$, then testing if $v↑{\null}↓{\\r}
< v↑{\prime}↓{\\r}$, or if $v↑{\null}↓{\\r} = v↑{\prime}↓{\\r}$
and $v↑{\null}↓{\\r-1} < v↑{\prime}↓{\\r-1}$, etc. It is not necessary
to convert all the way to binary or decimal notation if we only
want to know whether $(u↓1, \ldotss , u↓r)$ is less than $(u↑{\prime}↓{1},
\ldotss , u↑{\prime}↓{r})$.
The operation of comparing two numbers,
or of deciding if a modular number is negative, is intuitively
very simple, so we would expect to have a much easier method
for making this test than the conversion to mixed-radix form.
But the following theorem shows that there is little hope of
finding a substantially better method, since the range of a
modular number depends essentially on all bits of all the residues $(u↓1,\ldotss,
u↓r)$:
% Vicinity of page 274
% folio 370 galley 9 *
\algbegin Theorem S ({\rm Nicholas \α{Szab\'o},
1961}). {\sl In terms of the notation above, assume that
$m↓1 < \sqrt{m}$, and let $L$ be any value in
the range
$$m↓1 ≤ L ≤ m - m↓1.\eqno (26)$$
Let $g$ be any function such that the set $\{g(0),
g(1), \ldotss , g(m↓1 - 1)\}$ contains fewer than $m↓1$ values. Then
there are numbers $u$ and $v$ such that}
$$\tabskip 0pt plus 1000pt
\vbox{\halign to size{$\hfill\dispstyle#\hfill$⊗#\tabskip0pt\cr
g(u \mod m↓1) = g(v \mod m↓1),\qquad u \mod m↓j = v \mod
m↓j\hbox{\ \ for }2 ≤ j ≤ r;⊗(27)\cr
\noalign{\vskip 2pt}
0 ≤ u < L ≤ v < m.⊗(28)\cr}}$$
\dproofbegin By hypothesis, there must exist
numbers $u ≠ v$ satisfying (27), since $g$ must take on the
same value for two different residues. Let $(u, v)$ be a pair
of values with $0 ≤ u < v < m$ satisfying (27), for which $u$
is a minimum. Since $u↑\prime = u - m↓1$ and $v↑\prime = v -
m↓1$ also satisfy (27), we must have $u↑\prime < 0$ by the minimality
of $u$. Hence $u < m↓1 ≤ L$; and if (28) does not hold, we must
have $v < L$. But $v > u$, and $v - u$ is a multiple of $m↓2
\ldotsm m↓r = m/m↓1$, so $v ≥ v - u ≥ m/m↓1 > m↓1$. Therefore,
if (28) does not hold for $(u, v)$, it will be satisfied for
the pair $(u↑{\prime\prime}, v↑{\prime\prime})
= (v - m↓1, u + m - m↓1)$.\quad\blackslug
\yyskip Of course, a similar result can
be proved for any $m↓j$ in place of $m↓1$; and we could also
replace (28) by the condition ``$a ≤ u < a + L ≤ v < a + m$''
with only minor changes in the proof. Therefore Theorem↔S shows
that many simple functions cannot be used to determine the range
of a modular number.
\yskip Let us now reiterate the main points of the discussion in this
section: Mod\-ular arithmetic can be a significant advantage for
applications in which the pre\-dom\-inant calculations involve exact
multiplication (or raising to a power) of large integers, combined
with addition and subtraction, but where there is very little
need to divide or compare numbers, {\sl or to test whether intermediate
results ``overflow'' out of range.} (It is important not to
forget the latter restriction; methods are available to test
for overflow, as in exercise↔12, but they are in general so
complicated that they nullify the advantages of modular arithmetic.)
Several applications of modular computations have been discussed
by H. \α{Takahasi} and Y. \α{Ishibashi,} {\sl Information Proc.\
in Japan \bf 1} (1961), 28--42.
An example of such an application is the
\inx{linear equations}
exact solution of linear equations with rational coefficients.
For various reasons it is desirable in this case to assume that
the moduli $m↓1$, $m↓2$, $\ldotss$, $m↓r$ are all large prime numbers;
the linear equations can be solved independently modulo each
$m↓j$. A detailed discussion of this procedure has been given
by I. \α{Borosh} and A. S. \α{Fraenkel} [{\sl Math.\ Comp.\ \bf 20} (1966),
107--112]. By means of their method, the nine independent solutions
of a system of 111 linear equations in 120 unknowns were obtained
exactly in less than one hour's running time on a \α{CDC 1604} computer.
The same procedure is worthwhile also for solving simultaneous
linear equations with floating point coefficients, when the
matrix of coefficients is ill-conditioned. The modular technique
(treating the given floating point coefficients as exact rational
numbers) gives a method for obtaining the {\sl true} answers
in less time than conventional methods can produce reliable
{\sl approximate} answers!\xskip [See M. T. \α{McClellan,} {\sl JACM \bf 20}
(1973), 563--588, for further developments of this approach;
and see also E.↔H. \α{Bareiss,} {\sl J. Inst.\ Math.\ and Appl.\
\bf 10} (1972), 68--104, for a discussion of its limitations.]
The published literature concerning modular arithmetic is mostly
oriented towards hardware design, since the carry-free properties
of modular arithmetic make it attractive from the standpoint
of high-speed operation. The idea was first published by A.
\α{Svoboda} and M. \α{Valach} in the Czechoslovakian journal {\sl Stroje
na Zpracov\'an\'\i\ Informac\'\i\ \bf 3} (1955),
247--295; then independently by H. L. \α{Garner} [{\sl IRE Trans.\
\bf EC--8} (1959), 140--147]. The use of moduli of the form
$2↑{e↓j} - 1$ was suggested by A. S. \α{Fraenkel} [{\sl JACM
\bf 8} (1961), 87--96], and several advantages of such moduli
were demonstrated by A. \α{Sch\"onhage} [{\sl Computing \bf 1}
(1966), 182--196]. See the book {\sl Residue Arithmetic and
its Applications to Computer Technology} by N. S. \α{Szab\'o}
and R. I. \α{Tanaka} (New York: McGraw-Hill, 1967), for additional
information and a comprehensive bibliography of the subject.
A Russian book published in 1968 by I. \t Ia. Akushski\u\i\ and D. I.
\t Iuditski\u\i\ includes a chapter about complex moduli [cf.\
{\sl Rev.\ Romaine des Math.\ \bf15} (1970), 159--160].
Further discussion of modular arithmetic can be found in Section 4.3.3B.
\exbegin{EXERCISES}
\exno 1. [20] Find all
integers $u$ that satisfy all of the following conditions: $u \mod 7 =
1$, $u \mod 11 = 6$, $u \mod 13 = 5$, $0 ≤ u < 1000$.
\exno 2. [M20] Would Theorem C still be true if we allowed $a$,
$u↓1$, $u↓2$, $\ldotss$, $u↓r$ and $u$ to be arbitrary real numbers
(not just integers)?
\trexno 3. [M26] ({\sl Generalized Chinese Remainder Theorem}.)\xskip
\inx{Chinese remainder theorem, generalized}
Let $m↓1$, $m↓2$, $\ldotss$, $m↓r$ be positive integers. Let $m$ be
the least common multiple of $m↓1$, $m↓2$, $\ldotss$, $m↓r$, and let
$a$, $u↓1$, $u↓2$, $\ldotss$, $u↓r$ be any integers. Prove that there
is exactly one integer $u$ that satisfies the conditions
$$a ≤ u < a + m,\qquad u ≡ u↓j\modulo {m↓j},\qquad 1
≤ j ≤ r,$$
provided that
$$u↓i ≡ u↓j\quad\biglp\hbox{modulo }\gcd(m↓i, m↓j)\bigrp ,\qquad 1 ≤
i < j ≤ r;$$
and there is no such integer $u$ when the latter condition fails to hold.
\exno 4. [20] Continue the process shown in (13); what would
$m↓7$, $m↓8$, $m↓9$, $\ldots$ be?
\trexno 5. [M23] Suppose that the method of (13) is continued
until no more $m↓j$ can be chosen; does this method give the
largest attainable value $m↓1m↓2 \ldotsm m↓r$ such that the $m↓j$
are odd positive integers less than 100 that are relatively
prime in pairs?
\exno 6. [M22] Let $e$, $f$, $g$ be nonnegative integers.\xskip (a) Show
that $2↑e ≡ 2↑f \modulo {2↑g - 1}$ if and only if $e ≡ f \modulo
g$.\xskip (b) Given that $e \mod f = d$ and $ce \mod f = 1$, prove that
$$\biglp (1 + 2↑d + \cdots + 2↑{(c-1)d}) \cdot
(2↑e - 1)\bigrp\mod (2↑f - 1) = 1.$$
$\biglp$Thus, we have a comparatively simple formula for
the inverse of $2↑e - 1$, modulo $2↑f - 1$, as required in (22).$\bigrp$
\trexno 7. [M21] Show that (23) can be rewritten as follows:
$$\eqalign{v↓1⊗← u↓1 \mod m↓1,\cr
v↓2⊗← (u↓2 - v↓1)\,c↓{12} \mod m↓2,\cr
v↓3⊗← \biglp u↓3 - (v↓1 + m↓1v↓2)\bigrp\,c↓{13}c↓{23}\mod m↓3,\cr
\vdots\cr
v↓r⊗← \biglp u↓r - (v↓1 + m↓1(v↓2 + m↓2(v↓3 +
\cdots + m↓{r-2}v↓{r-1})\ldotsm))\bigrp\,c↓{1r} \ldotsm c↓{(r-1)r}
\mod m↓r.\cr}$$
If the formulas are rewritten in this
way, we see that only $r - 1$ constants $C↓j = c↓{1j} \ldotsm
c↓{(j-1)j} \mod m↓j$ are needed instead of $r(r - 1)/2$ constants
$c↓{ij}$ as in (23). Discuss the relative merits of this version
of the formula as compared to (23), from the stand\-point of computer
calculation.
\exno 8. [M21] Prove that the number $u$ defined by (23) and
(24) satisfies (25).
\exno 9. [M20] Show how to go from the values $v↓1$, $\ldotss$,
$v↓r$ of the mixed-radix notation (24) back to the original residues
$u↓1$, $\ldotss$, $u↓r$, using only arithmetic mod $m↓j$ to compute
the value of $u↓j$.
\exno 10. [M25] An integer $u$ that lies in the symmetrical
range (10) might be represented by finding the numbers $u↓1$,
$\ldotss$, $u↓r$ such that $u ≡ u↓j \modulo {m↓j}$ and $-m↓j/2
< u↓j < m↓j/2$, instead of insisting that $0 ≤ u↓j < m↓j$ as
in the text. Discuss the modular arithmetic procedures that
would be appropriate in connection with such a symmetrical representation
$\biglp$including the conversion process, (23)$\bigrp$.
\exno 11. [M23] Assume that all the $m↓j$ are odd, and that
$u = (u↓1, \ldotss , u↓r)$ is known to be even, where $0 ≤ u
< m$. Find a reasonably fast method to compute $u/2$ using modular
arithmetic.
\exno 12. [M10] Prove that, if $0 ≤ u, v < m$, the modular addition
of $u$ and $v$ causes overflow (i.e., is outside the range allowed
by the modular representation) if and only if the sum is less
than $u$.\xskip (Thus the overflow detection problem is equivalent
to the comparison problem.)
\trexno 13. [M25] ({\sl \α{Automorphic numbers}.})\xskip An $n$-place decimal
number $x > 1$ is called an ``automorph'' by recreational mathematicians
if the last $n$ digits of $x↑2$ are equal to $x$. For example, 9376 is a 4-place
automorph, since $9376↑2 = 87909376$.\xskip
[See {\sl Scientific American \bf 218} (January 1968), 125.]
\yskip\hang\textindent{a)}Prove that an $n$-place number $x > 1$ is
an automorph if and only if $x \mod 5↑n = 0$ or 1 and $x
\mod 2↑n = 1$ or 0, respectively.\xskip $\biglp$Thus, if $m↓1 = 2↑n$ and
$m↓2 = 5↑n$, the only two $n$-place automorphs are the numbers
$M↓1$ and $M↓2$ in (7).$\bigrp$
\hang\textindent{b)}Prove that if $x$ is an $n$-place automorph,
then $(3x↑2 - 2x↑3)\mod 10↑{2n}$ is a $2n$-place automorph.
\hang\textindent{c)}Given that $cx ≡ 1\modulo y$, find
a simple formula for a number $c↑\prime$ depending on $c$ and↔$x$
but not on $y$, such that $c↑\prime
x↑2 ≡ 1\modulo {y↑2}$.
% Vicinity of page 278
% folio 372 galley 10 *
\jpar1000000
\runningrighthead{HOW FAST CAN WE MULTIPLY\:a?}
\section{4.3.3}
\sectionskip
\sectionbegin{\star4.3.3. How Fast Can We Multiply?}
The conventional method for \β{multiplication} in positional number systems,
Algorithm
4.3.1M, requires approximately $cmn$ operations to multiply
an $m$-digit number by an $n$-digit number, where $c$ is a constant.
In this section, let us assume for convenience that $m = n$,
and let us consider the following question: {\sl Does every
general computer algorithm for multiplying two $n$-digit numbers
require an execution time proportional to↔$n↑2$, as $n$ increases?}
\jpar 2
(In this question, a ``general'' algorithm
means one that accepts, as input, the number $n$ and two arbitrary
$n$-digit numbers in positional notation, and whose output is
their product in positional form. Certainly if we were allowed
to choose a different algorithm for each value of $n$, the question
would be of no interest, since multiplication could be done
for any specific value of $n$ by a ``table-lookup'' operation
in some huge table. The term ``computer algorithm'' is meant
to imply an algorithm that is suitable for implementation on
a digital computer such as \MIX, and the execution time is to
be the time it takes to perform the algorithm on such a computer.)
\subsectionbegin{A. Digital methods} The answer to
the above question is, rather surprisingly, ``No,'' and, in
fact, it is not very difficult to see why. For convenience,
let us assume throughout this section that we are working with
integers expressed in binary notation. If we have two $2n$-bit
numbers $u = (u↓{2n-1} \ldotsm u↓1u↓0)↓2$ and $v = (v↓{2n-1}
\ldotsm v↓1v↓0)↓2$, we can write
$$u = 2↑nU↓1 + U↓0,\qquad v = 2↑nV↓1 + V↓0,\eqno (1)$$
where $U↓1 = (u↓{2n-1} \ldotsm u↓n)↓2$ is the ``most significant
half'' of the number $u$ and $U↓0 = (u↓{n-1} \ldotsm u↓0)↓2$ is the ``least
significant half''; similarly $V↓1 = (v↓{2n-1} \ldotsm v↓n)↓2$ and $V↓0 =
(v↓{n-1} \ldotsm v↓0)↓2$. Now we have
$$uv = (2↑{2n} + 2↑n)U↓1V↓1 + 2↑n(U↓1 - U↓0)(V↓0 - V↓1) +
(2↑n + 1)U↓0V↓0.\eqno (2)$$
This formula reduces the problem of multiplying
$2n$-bit numbers to three multiplications of $n$-bit numbers, namely
$U↓1V↓1$, $(U↓1 - U↓0)(V↓0 - V↓1)$, and $U↓0V↓0$, plus some simple
shifting and adding operations.
Formula (2) can be used for double-precision multiplication
when we want a quadruple-precision result, and it will be just
a little faster than the traditional method on many machines.
But the main advantage of (2) is that we can use it
to define a \β{recursive process} for multiplication that is significantly
faster than the familiar order-$n↑2$ method when $n$ is large:
If $T(n)$ is the time required to perform multiplication of
$n$-bit numbers, we have
$$T(2n) ≤ 3T(n) + cn\eqno (3)$$
for some constant $c$, since the right-hand side
of (2) uses just three multiplications plus some additions and
shifts. Relation (3) implies by induction that
$$T(2↑k) ≤ c(3↑k - 2↑k),\qquad k ≥ 1,\eqno (4)$$
if we choose $c$ to be large enough so that this
inequality is valid when $k = 1$; therefore we have
$$\baselineskip 15pt\eqalignno{T(n)≤T(2↑{\lceil\lg n\rceil})⊗≤
c(3↑{\lceil\lg n\rceil}-2↑{\lceil\lg n\rceil})\cr
⊗ < 3c \cdot 3↑{\lg n} = 3cn↑{\lg 3}.⊗(5)\cr}$$
Relation (5) shows that the running time for multiplication
can be reduced from order $n↑2$ to order $n↑{\lg3} \approx
n↑{1.585}$, so the recursive method is much faster than the traditional method when
$n$ is large.
(A similar but more complicated method for doing multiplication
with running time of order $n↑{\lg3}$ was apparently first suggested
by A. \α{Karatsuba} in {\sl Doklady Akad.\ Nauk SSSR \bf 145}
(1962), 293--294 [English translation in {\sl Soviet Physics--Doklady \bf7}
(1963), 595--596]. Curiously, this idea does not seem to have
been discovered before 1962; none of the ``\α{calculating prodigies}''
\inx{mental arithmetic}
who have become famous for their ability to multiply large numbers
mentally have been reported to use any such method, although
formula (2) adapted to decimal notation would seem to lead to
a reasonably easy way to multiply eight-digit numbers in one's
head.)
The running time can be reduced still further, in the limit
as $n$ approaches infinity, if we observe that the method just
used is essentially the special case $r = 1$ of a more general
method that yields
$$T\biglp (r + 1)n\bigrp ≤ (2r + 1)T(n) + cn\eqno (6)$$
for any fixed $r$. This more general method
can be obtained as follows: Let
$$u = (u↓{(r+1)n-1} \ldotsm u↓1u↓0)↓2\qquad\hbox{and}\qquad v=
(v↓{(r+1)n-1} \ldotsm v↓1v↓0)↓2$$
be broken into $r + 1$ pieces,
$$u = U↓r2↑{rn} + \cdots + U↓12↑n + U↓0,\qquad v = V↓r2↑{rn}
+ \cdots + V↓12↑n + V↓0,\eqno (7)$$
where each $U↓j$ and each $V↓j$ is an $n$-bit number.
Consider the polynomials
$$U(x) = U↓rx↑r + \cdots + U↓1x + U↓0,\qquad V(x) = V↓rx↑r
+ \cdots + V↓1x + V↓0,\eqno (8)$$
and let
$$W(x) = U(x)V(x) = W↓{2r}x↑{2r} + \cdots + W↓1x + W↓0.\eqno(9)$$
Since $u = U(2↑n)$ and $v = V(2↑n)$, we have $uv
= W(2↑n)$, so we can easily compute $uv$ if we know the coefficients
of $W(x)$. The problem is to find a good way to compute the
coefficients of $W(x)$ by using only $2r + 1$ multiplications
of $n$-bit numbers plus some further operations that involve only an execution
time proportional to $n$. This can be done by computing
$$U(0)V(0) = W(0),\quad U(1)V(1) = W(1),\quad \ldotss ,\quad
U(2r)V(2r) = W(2r).\eqno (10)$$
The coefficients of a polynomial of degree $2r$
can be written as a linear combination of the values of that
polynomial at $2r + 1$ distinct points; computing such a linear combination
requires an execution time at most proportional to↔$n$.\xskip (Actually,
the products $U(j)V(j)$ are not strictly products of $n$-bit
numbers, but they are products of at most $(n + t)$-bit numbers,
where $t$ is a fixed value depending on $r$. It is easy to design
a multiplication routine for $(n+t)$-bit numbers that requires only
$T(n)+c↓1n$ operations, where $T(n)$ is the number of operations needed for
$n$-bit multiplications, since two products of $t$-bit by $n$-bit numbers
can be done in $c↓2n$ operations when $t$ is fixed.)\xskip Therefore we obtain a
method of multiplication satisfying (6).
Relation (6) implies that
$T(n) ≤ c↓3n↑{\log↓{r+1}(2r+1)}
< c↓3n↑{1+\log↓{r+1}2}$, if we argue as in
the derivation of (5), so we have now proved the following result:
\thbegin Theorem A. {\sl Given $ε > 0$, there exists a multiplication algorithm
such that the number of
elementary operations $T(n)$ needed to multiply two $n$-bit numbers
satisfies
$$T(n) < c(ε)n↑{1+ε},\eqno (11)$$
for some constant $c(ε)$ independent of $n$.}\quad\blackslug
\yyskip This theorem is still not the result
we are after. It is unsatisfactory for practical purposes in
that the method becomes much more complicated as $ε → 0$ (and
therefore as $r → ∞$), causing $c(ε)$ to grow so rapidly that
extremely huge values of $n$ are needed before we have any significant
improvement over (5). And it is unsatisfactory for theoretical
purposes because it does not make use of the full power of the
polynomial method on which it is based. We can obtain a better
result if we let $r$ {\sl vary} with $n$, choosing larger and
larger values of $r$ as $n$ increases. This idea is due to A.
L. \α{Toom} [{\sl Doklady Akad.\ Nauk SSSR \bf 150} (1963), 496--498,
English translation in {\sl Soviet Mathematics \bf 3} (1963), 714--716],
who used it to show that computer circuitry for multiplication
of $n$-bit numbers can be constructed involving a fairly small
number of components as $n$ grows. S. A. \α{Cook} [{\sl On the minimum
computation time of functions} (Thesis, Harvard University,
1966), 51--77] later showed how Toom's method can be adapted
to fast computer programs.
% Vicinity of page 280
% folio 376 galley 11 WARNING: Bad spots on this tape. *
Before we discuss the Toom--Cook algorithm any further, let us
study a small example of the transition from $U(x)$ and $V(x)$
to the coefficients of $W(x)$. This example will not demonstrate
the efficiency of the method, since the numbers are too small,
but it points out some useful simplifications that we can make
in the general case. Suppose that we want to multiply $u = 1234$
times $v = 2341$; in binary notation this is $u = (0100\ 1101\ 0010)↓2$ times
$v=(1001\ 0010\ 0101)↓2$. Let $r=2$; the polynomials $U(x)$, $V(x)$ in (8) are
$$U(x)=4x↑2+13x+2,\qquad V(x)=9x↑2+2x+5.$$
Hence we find, for $W(x)=U(x)V(x)$,
$$\vbox{\halign to size{$\hfill#(0)=\null$⊗\hfill#,\tabskip 0pt plus 100pt
⊗$\hfill#(1)=\null$\tabskip0pt⊗\hfill#,\tabskip 0pt plus 100pt
⊗$\hfill#(2)=\null$\tabskip0pt⊗\hfill#,\tabskip 0pt plus 100pt
⊗$\hfill#(3)=\null$\tabskip0pt⊗\hfill#,\tabskip 0pt plus 100pt
⊗$\hfill#(4)=\null$\tabskip0pt⊗\hfill#\cr
U⊗2⊗U⊗19⊗U⊗44⊗U⊗77⊗U⊗118;\cr
V⊗5⊗V⊗16⊗V⊗45⊗V⊗92⊗V⊗157;\cr
W⊗10⊗W⊗304⊗W⊗1980⊗W⊗7084⊗W⊗18526.\cr}}\eqno(12)$$
Our job now is to compute the five coefficients
of $W(x)$ from the latter five values.
There is an attractive little algorithm that can be used to
compute the coefficients of a polynomial $W(x) = W↓{m-1}x↑{m-1}
+ \cdots + W↓1x + W↓0$ when the values $W(0), W(1), \ldotss ,
W(m - 1)$ are given: Let us first write
\def\\{\hskip1pt}
$$W(x) = \theta ↓{m-1}\\x↑{\underline{m-1}}+\theta↓{m-2}\\x↑{\underline{m-2}}+\cdots
+ \theta ↓1x↑{\underline1} + \theta ↓0,\eqno (13)$$
where $x↑{\underline k} = x(x - 1) \ldotsm (x - k + 1)$, and
\inx{interpolation}
where the coefficients $\theta ↓j$ are unknown. The falling \α{factorial powers}
have the important property that
$$W(x + 1) - W(x) = (m - 1)\theta ↓{m-1}\\x↑{\underline{m-2}} + (m - 2)\theta
↓{m-2}\\x↑{\underline{m-3}} + \cdots + \theta ↓1;$$
hence by induction we find that, for all $k ≥ 0$,
$$\vbox{\halign{\hbox to size{$\dispstyle#$}\cr
\9{1\over k!}\biggglp W(x\!+\!k) - {k\choose1}W(x\!+\!k\!-\!1)+
{k\choose2}W(x\!+\!k\!-\!2)-\cdots+(-1)↑kW(x)\bigggrp\hfill\cr
\noalign{\vskip 4pt}
\quad\9= {m-1\choose k}\,\theta↓{m-1}\\x↑{\underline{m-1-k}}
+{m-2\choose k}\,\theta↓{m-2}\\x↑{\underline{m-2-k}}+\cdots +{k\choose k}
\,\theta ↓k.\hfill(14)\cr}}$$
Denoting the left-hand side of (14) by $(1/k!)\,{\Delta\!} ↑k\,W(x)$, we see that
$${1\over k!} {\Delta\!} ↑k\,W(x) = {1\over k} \left({1\over (k -
1)!} {\Delta\!} ↑{k-1}W(x + 1) - {1\over (k - 1)!} {\Delta\!} ↑{k-1}W(x)\right)$$
and $(1/k!)\,{\Delta\!} ↑k\,W(0) = \theta ↓k$. So the
coefficients $\theta ↓j$ can be evaluated using a very simple
method, illustrated here for the polynomial $W(x)$ in (12):
$$\baselineskip 14pt\def\\{\noalign{\vskip-7pt}}
\vbox{\halign to size{$\hfill#$\tabskip0pt plus 100pt
⊗$\hfill#$⊗$\hfill#$⊗$\hfill#$⊗$\hfill#$⊗#\tabskip0pt\cr
10\cr\\
⊗ 294\cr\\
304⊗⊗1382/2 = \9691\cr\\
⊗ 1676⊗⊗1023/3 = 341\cr\\
1980⊗⊗3428/2 = 1714⊗⊗144/4 = 36⊗(15)\cr\\
⊗ 5104⊗⊗1455/3 = 485\cr\\
7084⊗⊗6338/2 = 3169\cr\\
⊗11442\cr\\
18526\cr}}$$
The leftmost column of this tableau is a listing
of the given values of $W(0)$, $W(1)$, $\ldotss$, $W(4)$; the $k$th
succeeding column is obtained by computing the difference between
successive values of the preceding column and dividing by $k$.
The coefficients $\theta ↓j$ appear at the top of the columns,
so that $\theta ↓0 = 10$, $\theta ↓1 = 294$, $\ldotss$, $\theta ↓4
= 36$, and we have
$$\eqalignno{W(x)⊗=36x↑{\underline4}+341x↑{\underline3}+691x↑{\underline2}
+294x↑{\underline1}+10\cr
⊗= \biglp ((36(x - 3) + 341)(x - 2) + 691)(x - 1) + 294\bigrp x
+ 10.⊗(16)\cr}$$
In general, we can write
$$\twoline{W(x)=\biglp\ldotsm((\theta↓{m-1}(x\!-\!m\!+\!2)+\theta↓{m-2})(x\!
-\!m\!+\!3)+\theta↓{m-3})(x\!-\!m\!+\!4)}{1pt}{\null
+ \cdots + \theta ↓1\bigrp\,x + \theta↓0,}$$
and this formula shows how the coefficients $W↓{m-1}$,
$\ldotss$, $W↓1$, $W↓0$ can be obtained from the $\theta $'s:
$$\vcenter{\baselineskip0pt\lineskip0pt
\def\\{\vrule height 9.5pt depth 2.5pt}\def\¬{\vrule height 2pt}
\halign{\hbox to 29.6pt{$\hfill#$\quad}⊗#⊗\!
\hbox to 59.6pt{$\hfill#$\quad}⊗#⊗\!
\hbox to 59.6pt{$\hfill#$\quad}⊗#⊗\!
\hbox to 59.6pt{$\hfill#$\quad}⊗#⊗\!
\hbox to 29.6pt{$\hfill#$\quad}\cr
\noalign{\hrule width 30pt}
⊗\¬\cr
36⊗\\⊗341\cr
⊗\\⊗-3\cdot36\cr
⊗\¬\cr
\noalign{\hrule width 90pt}
⊗⊗⊗\¬\cr
36⊗⊗233⊗\\⊗691\cr
⊗⊗-2\cdot36⊗\\⊗-2\cdot233\cr
⊗⊗⊗\¬\cr
\noalign{\hrule width 150pt}
⊗⊗⊗⊗⊗\¬\cr
36⊗⊗161⊗⊗225⊗\\⊗294\cr
⊗⊗-1\cdot36⊗⊗-1\cdot161⊗\\⊗-1\cdot225\cr
⊗⊗⊗⊗⊗\¬\cr
\noalign{\hrule width 210pt}
⊗⊗⊗⊗⊗⊗⊗\¬\cr
36⊗⊗125⊗⊗64⊗⊗69⊗\\⊗10\cr
⊗⊗⊗⊗⊗⊗⊗\¬\cr}}\eqno(17)$$
Here the numbers below the horizontal lines successively
show the coefficients of the polynomials
$$\twoline{\theta↓{m-1},\qquad\theta↓{m-1}(x+m+2)+\theta↓{m-2},}{3pt}{\biglp
\theta↓{m-1}(x-m+2)+\theta↓{m-2}\bigrp(x-m+3)+\theta↓{m-3},\qquad\hbox{etc.}}$$
From this tableau we have
$$W(x) = 36x↑4 + 125x↑3 + 64x↑2 + 69x + 10,$$
so the answer to our original problem is $1234 \cdot
2341 = W(16)$, where $W(16)$ is obtained by adding and shifting.
A generalization of this method for obtaining coefficients is
discussed in Section 4.6.4.
The basic \α{Stirling number} identity,
$$x↑n = {n\comb\{\} n}\,x↑{\underline n} + \cdots + {n\comb\{\}1}\,x↑{\underline1}
+{n\comb\{\}0},$$
Eq.\ 1.2.6--41, shows that if the coefficients of
$W(x)$ are nonnegative, so are the numbers $\theta ↓j$, and
in such a case {\sl all of the intermediate results in the above
computation are nonnegative.} This further simplifies the \α{Toom}--\α{Cook}
multiplication algorithm, which we will now consider in detail.
\algbegin Algorithm C (High-precision multiplication of binary numbers). Given
a positive integer $n$ and two nonnegative $n$-bit integers $u$ and $v$, this
algorithm forms their $2n$-bit product, $w$. Four auxiliary stacks are used to
hold the long numbers that are manipulated during the procedure:
$$\vbox{\halign{#\hfill\qquad⊗#\hfill\cr
Stacks $U$, $V$:⊗Temporary storage
of $U(j)$ and $V(j)$ in step C4.\cr
Stack $C$:⊗Numbers to be multiplied, and control codes.\cr
Stack $W$:⊗Storage of $W(j)$.\cr}}$$
These stacks may contain either binary numbers or special control
symbols called code-1, code-2, and code-3. The algorithm
also constructs an auxiliary table of numbers $q↓k$, $r↓k$; this
table is maintained in such a manner that it may be stored as
a linear list, where a single pointer that traverses the
list (moving back and forth) may be used to access the current
table entry of interest.
(Stacks $C$ and $W$ are used to control the
\β{recursive} mechanism of this multiplication algorithm in a reasonably
straightforward manner that is a special case of general
procedures discussed in Chapter↔8.)
\aalgstep C1. [Compute $q, r$ tables.] Set stacks
$U$, $V$, $C$, and $W$ empty. Set
$$k ← 1,\qquad q↓0 ← q↓1 ← 16,\qquad r↓0 ← r↓1 ← 4,\qquad
Q ← 4,\qquad R ← 2.$$
Now if $q↓{k-1} + q↓k < n$, set
$$k ← k + 1,\hskip1.5em Q ← Q + R,\hskip1.5em R ← \lfloor \sqrt Q\,\rfloor,
\hskip1.5em q↓k ← 2↑Q,\hskip1.5em r↓k ← 2↑R,$$
and repeat this operation until $q↓{k-1}
+ q↓k ≥ n$.\xskip $\biglp${\sl Note:} The calculation of $R ← \lfloor \sqrt Q\,
\rfloor$ does not require a square root to
be taken, since we may simply set $R ← R + 1$ if $(R + 1)↑2
≤ Q$ and leave $R$ unchanged if $(R + 1)↑2 > Q$; see exercise↔2.
In this step we build the sequences
$$\baselineskip15pt
\vbox{\halign{$\hfill#\;=\;\null$⊗$\ctr{#}$\qquad⊗$\ctr{#}$\qquad⊗$\ctr{#}$\qquad
⊗$\ctr{#}$\qquad⊗$\ctr{#}$\qquad⊗$\ctr{#}$\qquad⊗$\ctr{#}\qquad\ldots$\cr
k ⊗0⊗1⊗2⊗3⊗4⊗5⊗6\cr
q↓k ⊗2↑4⊗2↑4⊗2↑6⊗2↑8⊗2↑{10}⊗2↑{13}⊗2↑{16}\cr
r↓k ⊗2↑2⊗2↑2⊗2↑2⊗2↑2⊗2↑3⊗2↑3⊗2↑4\cr}}$$
The multiplication of 70000-bit numbers would
cause this step to terminate with $k = 6$, since $70000 < 2↑{13}
+ 2↑{16}$.$\bigrp$
\aalgstep C2. [Put $u, v$ on stack.] Put code-1 on
stack↔$C$, then place $u$ and $v$ onto stack↔$C$ as numbers
of exactly $q↓{k-1} + q↓k$ bits each.
\aalgstep C3. [Check recursion level.] Decrease $k$ by 1. If
$k = 0$, the top of stack↔$C$ now contains two 32-bit numbers, $u$
and $v$; remove them, set $w ← uv$ using a built-in routine for multiplying
32-bit numbers, and go to step C10. If $k > 0$, set $r ← r↓k$,
$q ← q↓k$, $p ← q↓{k-1}+q↓k$, and go on to step C4.
% Vicinity of page 283
% folio 379 galley 12 *
\aalgstep C4. [Break into $r + 1$
parts.] Let the number at the top of stack↔$C$ be regarded as
a list of $r + 1$ numbers with $q$ bits each, $(U↓r \ldotsm U↓1U↓0)↓{2↑q}$.\xskip
(The top of stack↔$C$ now contains an $(r + 1)q = (q↓k +
q↓{k+1})$-bit number.)\xskip For $j = 0$, 1, $\ldotss$,↔$2r$, compute the
$p$-bit numbers
$$\biglp\ldotsm(U↓rj + U↓{r-1})j + \cdots + U↓1\bigrp j + U↓0 = U(j)$$
and successively put these values onto
stack↔$U$.\xskip (The bottom of stack↔$U$ now contains $U(0)$, then comes
$U(1)$, etc., with $U(2r)$ on top. Note that
$$ U(j) ≤ U(2r) < 2↑q\biglp (2r)↑r + (2r)↑{r-1} + \cdots
+ 1\bigrp < 2↑{q+1}(2r)↑r ≤ 2↑p,$$
by exercise↔3.)\xskip Then remove $U↓r \ldotsm U↓1U↓0$ from stack↔$C$.
\hangindent 24pt after 0
Now the top of stack↔$C$ contains
another list of $r + 1$ $q$-bit numbers, $V↓r \ldotsm V↓1V↓0$,
and the $p$-bit numbers
$$\biglp\ldotsm(V↓rj + V↓{r-1})j + \cdots + V↓1\bigrp j+ V↓0 = V(j)$$
should be put onto stack↔$V$ in the same
way. After this has been done, remove $V↓r \ldotsm V↓1V↓0$ from
stack↔$C$.
\aalgstep C5. [Recurse.] Successively put the following
items onto stack↔$C$, at the same time emptying stacks $U$
and↔$V$:
$$\eqalign{\hbox{code-2, $V(2r)$, $U(2r)$, }
⊗\hbox{code-3, $V(2r - 1)$, $U(2r- 1)$, $\ldotss $,}\cr
⊗\hbox{\quad code-3, $V(1)$, $U(1)$, code-3, $V(0)$, $U(0)$.}\cr}$$
Go back to step C3.
\aalgstep C6. [Save one product.] $\biglp$At this point the multiplication
algorithm has set $w$ to one of the products $W(j) = U(j)V(j)$.$\bigrp$
Put $w$ onto stack↔$W$.\xskip (This number $w$ contains $2(q↓k + q↓{k-1})$
bits.)\xskip Go back to step C3.
\topinsert{\vskip 53mm
\inx{COPY PREP: Fig 4-8 to be inserted}
\ctrline{\caption Fig.\ 8. \α{Toom}--\α{Cook} algorithm
for high-precision multiplication.}}
\aalgstep C7. [Find $\theta$'s.] Set $r
← r↓k$, $q ← q↓k$, $p ← q↓{k-1} + q↓k$.\xskip (At this point stack↔$W$
contains a sequence of numbers ending with $W(0)$, $W(1)$, $\ldotss$, $W(2r)$ from
bottom to top, where each $W(j)$ is a $2p$-bit number.)
\hangindent 24pt after 0
Now for $j = 1$, 2, 3, $\ldotss$, $2r$,
perform the following loop: For $t = 2r$, $2r - 1$, $2r - 2$, $\ldotss
$, $j$, set $W(t) ← \biglp W(t)-
W(t - 1)\bigrp /j$.\xskip $\biglp$Here $j$ must increase
and $t$ must decrease. The quantity $\biglp W(t) - W(t - 1)\bigrp
/j$ will always be a nonnegative integer that fits in $2p$
bits; cf.\ (15).$\bigrp$
\aalgstep C8. [Find $W$'s.] For $j = 2r - 1$, $2r - 2$, $\ldotss$,
1, perform the following loop: For $t = j$, $j + 1$, $\ldotss$,
$2r - 1$, set $W(t) ← W(t) - jW(t + 1)$.\xskip $\biglp$Here $j$ must decrease
and $t$ must increase. The result of this operation will again
be a nonnegative $2p$-bit integer; cf.\ (17).$\bigrp$
\aalgstep C9. [Set answer.] Set $w$ to the $2(q↓k + q↓{k+1})$-bit
integer
$$\biglp\ldotsm\biglp W(2r)2↑q + W(2r - 1)\bigrp 2↑q + \cdots + W(1)
\bigrp 2↑q + W(0).$$
Remove $W(2r)$, $\ldotss$, $W(0)$ from stack↔$W$.
\aalgstep C10. [Return.] Set $k ← k + 1$. Remove
the top of stack↔$C$. If it is code-3, go to step↔C6. If it is code-2,
put $w$ onto stack↔$W$ and go to step↔C7. And if it is \hbox{code-1}, terminate
the algorithm ($w$ is the answer).\quad\blackslug
\yyskip Let us now
estimate the running time, $T(n)$, for Algorithm↔C, in terms
of some things we shall call ``cycles,'' i.e., elementary machine
operations. Step↔C1 takes $O(q↓k)$ cycles, even if we represent
the number $q↓k$ internally as a long string of $q↓k$ bits followed
by some delimiter, since $q↓k + q↓{k-1} + \cdots + q↓0$ will
be $O(q↓k)$. Step↔C2 obviously takes $O(q↓k)$ cycles.
Now let $t↓k$ denote the amount of computation required to get
from step↔C3 to step↔C10 for a particular value of $k$ (after
$k$ has been decreased at the beginning of step↔C3). Step↔C3
requires $O(q)$ cycles at most. Step↔C4 involves $r$ multiplications
of $p$-bit numbers by $(\lg2r)$-bit numbers, and $r$ additions
of $p$-bit numbers, all repeated $4r + 2$ times. Thus we need
a total of $O(r↑2q\log r)$ cycles. Step↔C5 requires moving
$4r + 2$ $p$-bit numbers, so it involves $O(rq)$ cycles. Step↔C6
requires $O(q)$ cycles, and it is done $2r + 1$ times per
iteration. The recursion involved when the algorithm essentially
invokes itself (by returning to step↔C3) requires $t↓{k-1}$
cycles, $2r + 1$ times. Step↔C7 requires $O(r↑2)$ subtractions
of $p$-bit numbers and divisions of $2p$-bit by $(\lg2r)$-bit
numbers, so it requires $O(r↑2q\log r)$ cycles. Similarly,
step↔C8 requires $O(r↑2q\log r)$ cycles. Step↔C9 involves $O(rq)$
cycles, and C10 takes hardly any time at all.
Summing up, we have $T(n) = O(q↓k)+ O(q↓k) + t↓{k-1}$, where
(if $q = q↓k$ and $r = r↓k$) the main contribution to the running
time satisfies
$${\eqalign{t↓k⊗= O(q) + O(r↑2q\log r) + O(rq) + (2r + 1)O(q) +
O(r↑2q\log r)\cr
⊗\hskip 120pt + O(r↑2q\log r) + O(rq) + O(q)
+ (2r + 1)t↓{k-1}\cr
\noalign{\vskip 3pt}
⊗ = O(r↑2q\log r) + (2r + 1)t↓{k-1}.\cr}}$$
Thus there is a constant $c$ such that
$$t↓k ≤ cr↑{2}↓{k}q↓k\lg r↓k + (2r↓k + 1)t↓{k-1}.$$
To complete the estimation of $t↓k$ we can prove
by brute force that
$$t↓k ≤ Cq↓{k+1}2↑{2.5\sqrt{\,\lg q↓{k+1}}}\eqno (18)$$
for some constant $C$. Let us choose $C > 20c$,
and let us also take $C$ large enough so that (18) is valid
for $k ≤ k↓0$, where $k↓0$ will be specified below. Then when
$k > k↓0$, let $Q↓k =\lg q↓k$, $R↓k =\lg r↓k$; we have by
induction
$$t↓k ≤ cq↓kr↑{2}↓{k}\lg r↓k + (2r↓k + 1)Cq↓k2↑{2.5\sqrt{Q↓k}}
= Cq↓{k+1}2↑{2.5\sqrt{\,\lg q↓{k+1}}}(\eta ↓1 + \eta ↓2),$$
where
$$\eqalign{\eta ↓1 ⊗ = {c\over C} R↓k2↑{R↓k-2.5\sqrt{Q↓{k+1}}} < {1\over
20} R↓k2↑{-R↓k} < 0.05,\cr
\noalign{\vskip 3pt}
\eta ↓2 ⊗= \left(2
+ {1\over r↓k}\right) 2↑{2.5(\sqrt{Q↓k}-\sqrt{Q↓{k+1}})} → 2↑{-1/4} < 0.85,\cr}$$
since
$$\sqrt{Q↓{k+1}} - \sqrt{Q↓k} = \sqrt{Q↓k+\lfloor\sqrt{Q↓k}\rfloor}
- \sqrt{Q↓k} → \textstyle{1\over 2}$$
as $k → ∞$. It follows that we can find $k↓0$ such
that $\eta ↓2 < 0.95$ for all $k > k↓0$, and this completes
the proof of (18) by induction.
Finally, therefore, we may compute $T(n)$. Since $n > q↓{k-1}
+ q↓{k-2}$, we have $q↓{k-1} < n$; hence
$$r↓{k-1} = 2↑{\lfloor\sqrt{\,\lg q↓{k-1}}\rfloor } < 2↑{\sqrt{\,\lg n}},\qquad
\hbox{and}\qquad q↓k = r↓{k-1}q↓{k-1} < n2↑{\sqrt{\,\lg n}}.$$
Thus
$$t↓{k-1} ≤ Cq↓k2↑{2.5\sqrt{\,\lg q↓k}} < Cn2↑{\sqrt{\,\lg n}\,
+2.5(\sqrt{\,\lg n}\,+1)},$$
and, since $T(n) = O(q↓k) + t↓{k-1}$, we have finally derived
the following theorem:
\thbegin Theorem C. {\sl There is a constant $c↓0$ such
that the execution time of Algorithm↔C
is less than $c↓0n2↑{3.5\sqrt{\,\lg n}}$
cycles.}\quad\blackslug
\yyskip Since $n2↑{3.5\sqrt{\,\lg n}} = n↑{1+3.5/\sqrt{\,\lg n}}$, this result is
noticeably stronger than Theorem↔A\null. By adding
a few complications to the algorithm, pushing the ideas to their
apparent limits (see exercise↔5), we can improve the estimated
execution time to
$$T(n) = O(n2↑{\sqrt{2\lg n}}\log n).\eqno (19)$$
\subsectionbegin{B. A \β{modular} method} There is another
way to multiply large numbers very rapidly, based on the ideas
of modular arithmetic as presented in Section 4.3.2. It is very
hard to believe at first that this method can be of advantage,
since a multiplication algorithm based on modular arithmetic
must include the choice of moduli and the conversion of numbers
into and out of modular representation, besides the actual multiplication
operation itself. In spite of these formidable difficulties,
A. \β{Sch\"onhage} discovered that all of these operations can be carried out
quite rapidly.
% Vicinity of page 286
% folio 382 galley 13 *
In order to understand
the essential mechanism of Sch\"onhage's method, we shall
look at a special case. Consider the sequence defined by the
rules
$$q↓0 = 1,\qquad q↓{k+1} = 3q↓k - 1,\eqno (20)$$
so that $q↓k = 3↑k - 3↑{k-1} - \cdots - 1 = {1\over
2}(3↑k + 1)$. We will study a procedure that multiplies $(18q↓k
+ 8)$-bit numbers, in terms of a method for multiplying $(18q↓{k-1}
+ 8)$-bit numbers. Thus, if we know how to multiply numbers
having $(18q↓0 + 8) = 26$ bits, the procedure to be described
will show us how to multiply numbers of $(18q↓1 + 8) = 44$ bits,
then 98 bits, then 260 bits, etc., eventually increasing the
number of bits by almost a factor of 3 at each step.
Let $p↓k = 18q↓k + 8$. When multiplying $p↓k$-bit numbers, the
idea is to use the six moduli
$$\baselineskip 15pt
\lpile{m↓1 = 2↑{6q↓k-1}-1,\cr m↓4=2↑{6q↓k+3}-1,\cr}\qquad
\lpile{m↓2 = 2↑{6q↓k+1}-1,\cr m↓5=2↑{6q↓k+5}-1,\cr}\qquad
\lpile{m↓3 = 2↑{6q↓k+2}-1,\cr m↓6=2↑{6q↓k+7}-1.\cr}\eqno(21)$$
These moduli are relatively prime, by Eq.\ 4.3.2--18,
since the exponents
$$6q↓k - 1,\quad 6q↓k + 1,\quad 6q↓k + 2,\quad 6q↓k + 3,\quad
6q↓k + 5,\quad 6q↓k + 7\eqno (22)$$
are always relatively prime (see exercise↔6). The
six moduli in (21) are capable of representing numbers up to
$m = m↓1m↓2m↓3m↓4m↓5m↓6 > 2↑{36q↓k+16} = 2↑{2p↓k}$,
so there is no chance of overflow in the multiplication
of $p↓k$-bit numbers $u$ and $v$. Thus we may use the following
method, when $k>0$:
\yyskip\hang\textindent{a)}Compute $u↓1 = u \mod m↓1$, $\ldotss$, $u↓6 = u \mod
m↓6$; and $v↓1 = v \mod m↓1$, $\ldotss$, $v↓6 = v \mod m↓6$.
\yskip\hang\textindent{b)}Multiply $u↓1$ by $v↓1$, $u↓2$ by $v↓2$, $\ldotss$, $u↓6$
by $v↓6$. These are numbers of at most $6q↓k + 7 = 18q↓{k-1}
+ 1 < p↓{k-1}$ bits, so the multiplications can be performed
by using the assumed $p↓{k-1}$-bit multiplication procedure.
\yskip\hang\textindent{c)}Compute $w↓1 = u↓1v↓1 \mod m↓1$, $w↓2 = u↓2v↓2 \mod
m↓2$, $\ldotss$, $w↓6 = u↓6v↓6 \mod m↓6$.
\yskip\hang\textindent{d)}Compute $w$ such that $0 ≤ w < m$, $w \mod m↓1 = w↓1$,
$\ldotss$, $w \mod m↓6 = w↓6$.
\yyskip Let $t↓k$ be the amount
of time needed for this process. It is not hard to see that
operation (a) takes $O(p↓k)$ cycles, since the determination
of $u \mod(2↑e - 1)$ is quite simple (like ``casting out nines''),
as shown in Section 4.3.2. Similarly, operation (c) takes $O(p↓k)$
cycles. Operation (b) requires essentially $6t↓{k-1}$ cycles.
This leaves us with operation (d), which seems to be quite a
difficult computation; but Sch\"onhage has found an ingenious
way to perform step (d) in $O(p↓k \log p↓k)$ cycles, and this
is the crux of the method. As a consequence, we have
$$t↓k = 6t↓{k-1} + O(p↓k \log p↓k).$$
Since $p↓k = 3↑{k+2} + 17$, we can show that the time for $n$-bit multiplication is
$$T(n)= O(N↑{\log↓3 6})=O(N↑{1.63}).\eqno (23)$$
(See exercise↔7.)
Although the modular method is more
complicated than the $O(n↑{\lg3})$ procedure discussed at the beginning
of this section, Eq.\ (23) shows that it does, in fact, lead to an execution time
substantially better than $O(n↑2)$ for the multiplication of
$n$-bit numbers. Thus we can improve on the classical method
by using either of two completely different approaches.
Let us now analyze operation (d) above. Assume that we are given a set of
positive integers $e↓1 < e↓2 < \cdots < e↓r$, relatively
prime in pairs; let
$$m↓1 = 2↑{e↓1} - 1,\qquad m↓2 = 2↑{e↓2} - 1,\qquad
\ldotss ,\qquad m↓r = 2↑{e↓r} - 1.\eqno (24)$$
We are also given numbers $w↓1$, $\ldotss$, $w↓r$ such
that $0 ≤ w↓j ≤ m↓j$. Our job is {\sl to determine the binary
representation of the number $w$ that satisfies the conditions}
$$\cpile{0 ≤ w < m↓1m↓2 \ldotsm m↓r,\cr\noalign{\vskip4pt}
w ≡ w↓1\modulo {m↓1},\qquad \ldotss ,\qquad
w ≡ w↓r\modulo {m↓r}.\cr}\eqno (25)$$
The method is based on (23) and (24) of Section
\inxf{Chinese remainder theorem}
4.3.2. First we compute
$$w↑{\prime}↓{j} =\biglp\ldotsm((w↓j - w↑{\prime}↓{1})\,c↓{1j}
- w↑{\prime}↓{2})\,c↓{2j} - \cdots - w↑{\prime}↓{j-1}\bigrp\,
c↓{(j-1)j} \mod m↓j,\eqno (26)$$
for $j = 2$, $\ldotss$, $r$, where $w↑{\prime}↓{1}
= w↓1 \mod m↓1$; then we compute
$$w = \biglp \ldotsm(w↑{\prime}↓{r}m↓{r-1} + w↑{\prime}↓{r-1})\,m↓{r-2}
+ \cdots + w↑{\prime}↓{2}\bigrp\, m↓1 + w↑{\prime}↓{1}.\eqno(27)$$
Here $c↓{ij}$ is a number such that $c↓{ij}m↓i
≡ 1\modulo {m↓j}$; these numbers $c↓{ij}$ are not given, they
must be determined from the $e↓j$'s.
The calculation of (26) for all $j$ involves $r\choose 2$ additions
modulo $m↓j$, each of which takes $O(e↓r)$ cycles, plus $r\choose 2$
multiplications by $c↓{ij}$, modulo $m↓j$. The calculation of
$w$ by formula (27) involves $r$ additions and $r$ multiplications
by $m↓j$; it is easy to multiply by $m↓j$, since this is just
adding, shifting, and subtracting, so it is clear that the evaluation
of Eq.\ (27) takes $O(r↑2e↓r)$ cycles. We will soon see that
each of the multiplications by $c↓{ij}$, modulo $m↓j$, requires
only $O(e↓r \log e↓r)$ cycles, and therefore {\sl it is possible to
complete the entire
job of conversion in $O(r↑2e↓r \log e↓r)$ cycles}.
The above observations leave us with the
following problem to solve: Given positive integers $e < f$
and a nonnegative integer $u < 2↑f$, compute the value of $(cu)\mod(2↑f
- 1)$, where $c$ is the number such that $(2↑e - 1)c ≡ 1\modulo
{2↑f - 1}$; and the computation must be done in $O(f \log f)$ cycles. The
result of exercise 4.3.2--6 gives a formula for $c$ that suggests
a suitable procedure. First we find the least positive
integer $b$ such that
$$be ≡ 1\modulo f.\eqno (28)$$
This can be done using \α{Euclid's algorithm} in $O\biglp
(\log f)↑3\bigrp$ cycles, since Euclid's algorithm applied
to $e$ and $f$ requires $O(\log f)$ iterations, and each iteration
requires $O\biglp (\log f)↑2\bigrp$ cycles; alternatively,
we could be very sloppy here without violating the total time
constraint, by simply trying $b = 1$, 2, etc., until (28) is
satisfied, since such a process would take $O(f \log f)$ cycles
in all. Once $b$ has been found, exercise 4.3.2--6 tells us
that
$$c = c[b] =\bigglp\sum ↓{0≤j<b}2↑{je}\biggrp\mod(2↑f - 1).\eqno(29)$$
A brute-force multiplication of $(cu)
\mod (2↑f - 1)$ would not be good enough to solve the problem,
since we do not know how to multiply general $f$-bit numbers
in $O(f \log f)$ cycles. But the special form of $c$ provides
a clue: The binary representation of $c$ is composed of bits
in a regular pattern, and Eq.\ (29) shows that the number $c[2b]$
can be obtained in a simple way from $c[b]$. This suggests
that we can rapidly multiply a number $u$ by $c[b]$ if we
build $c[b]u$ up in $\lg b$ steps in a suitably clever manner,
such as the following: Let the binary notation for $b$ be
$$b = (b↓s \ldotsm b↓2b↓1b↓0)↓2;$$
we may calculate the sequences $a↓k$, $d↓k$, $u↓k$,
$v↓k$ defined by the rules
$$\eqalign{a↓0 ⊗= e,\cr
d↓0⊗=b↓0e,\cr
u↓0⊗=u,\cr
v↓0⊗=b↓0u,\cr}\qquad\eqalign{a↓k ⊗= 2a↓{k-1} \mod f;\cr
d↓k ⊗= (d↓{k-1} + b↓{k\,}a↓k)\mod f;\cr
u↓k ⊗= (u↓{k-1} + 2↑{a↓{k-1}}u↓{k-1})\mod(2↑f-1);\cr
v↓k ⊗= (v↓{k-1} + b↓{k\,}2↑{d↓{k-1}}u↓k)\mod(2↑f-1).\cr}\eqno(30)$$
It is easy to prove by induction on $k$ that
$$\eqalign{a↓k ⊗= (2↑ke)\mod f;\cr
d↓k⊗=\biglp(b↓k\ldotsm b↓1b↓0)↓2\,e\bigrp\mod f;\cr}\qquad\eqalign{
u↓k⊗=(c[2↑{k\,}]u)\mod(2↑f-1);\cr
v↓k ⊗= \biglp c[(b↓k \ldotsm b↓1b↓0)↓2]u\bigrp \mod(2↑f - 1).\cr}\eqno (31)$$
Hence the desired result, $(c[b]u)\mod(2↑f - 1)$,
is $v↓s$. The calculation of $a↓k$, $d↓k$, $u↓k$, $v↓k$ from $a↓{k-1}$,
$d↓{k-1}$, $u↓{k-1}$, $v↓{k-1}$ takes $O(\log f)+O(\log f)+O(f)+O(f)=O(f)$
cycles, and therefore the entire calculation can be done in $s\,O(f)=O(f\log f)$
cycles as desired.
The reader will find it instructive to study the ingenious method
represented by (30) and (31) very carefully. Similar techniques
are discussed in Section 4.6.3.
% Vicinity of page 289
% folio 385 galley 14 ,1979 *
\def\vchop#1{\vbox to 0pt{\vskip 0pt minus 100pt\hbox{#1}}}
Sch\"onhage's paper [{\sl Computing \bf 1} (1966), 182--196]
shows that these ideas can be extended to the multiplication
of $n$-bit numbers using \vchop{$r \approx 2↑{\sqrt{\chop to 0pt{\scriptstyle
2\lg n}}}$} moduli, obtaining
a method analogous to Algorithm↔C. We shall not dwell on the
details here, since Algorithm↔C is always superior; in fact,
an even better method is next on our agenda.
\subsectionbegin{C. Use of discrete Fourier transforms} The
critical problem in high-precision multiplication is the determination
of ``\β{convolution} products'' such as
$$u↓rv↓0 + u↓{r-1}v↓1 + \cdots + u↓0v↓r,$$
and there is an intimate relation between
convolutions and an important mathematical concept called ``Fourier transformation.''
If $\omega =\exp(2πi/K)$
is a $K$th root of unity, the one-dimensional \β{Fourier transform}
of the sequence of complex numbers
$(u↓0, u↓1, \ldotss , u↓{K-1})$ is defined to be the sequence $(
{\A u}↓0,{\A u}↓1, \ldotss , {\A u}↓{\!K-1})$, where
$$\chop to 12pt{{\A u}↓s = \sum ↓{0≤t<K}\omega ↑{st}u↓t,\qquad 0≤s<K.}\eqno(32)$$
Letting $({\A v}↓0, {\A v}↓1, \ldotss ,
{\A v}↓{K-1})$ be defined in the same way, as the Fourier transform of $(v↓0,
v↓1, \ldotss , v↓{K-1})$, it is not difficult to see that $(
{\A u}↓0{\A v}↓0, {\A u}↓1{\A v}↓1, \ldotss , {\A u}↓{\!K-1}
{\A v}↓{\!K-1})$ is the transform of $(w↓0, w↓1, \ldotss , w↓{K-1})$,
where
$$\eqalign{w↓r⊗ = u↓rv↓0 + u↓{r-1}v↓1 + \cdots + u↓0v↓r + u↓{K-1}v↓{r+1}
+ \cdots + u↓{r+1}v↓{K-1}\cr
\noalign{\vskip 7pt}
⊗ =\sum ↓{i+j≡r\,\modulo K}u↓iv↓j.\cr}$$
When $K ≥ 2n-1$ and $u↓n = u↓{n+1} = \cdots = u↓{K-1}
= v↓n = v↓{n+1} = \cdots = v↓{K-1} = 0$, the $w$'s are just what we
need for multiplication, since the terms
$u↓{K-1}v↓{r+1}+\cdots+
u↓{r+1}v↓{K-1}$ vanish when $0≤r≤2n-2$. In other words,
{\sl the transform of a convolution
product is the ordinary product of the transforms.} This idea
is actually a special case of \α{Toom}'s use of polynomials $\biglp$cf.\ (10)$\bigrp
$, with $x$ replaced by roots of unity.
If $K$ is a power of 2, the discrete Fourier transform (32) can be obtained
quite rapidly when the computations are arranged in a certain way, and so can the
inverse transform (determining the $w$'s from the $\A w$'s). This
property of Fourier transforms was exploited by V.
\β{Strassen} in 1968, who discovered how to multiply large numbers faster
than was possible under all previously known schemes.
He and A. Sch\"onhage later refined the method and published improved
procedures in {\sl Computing \bf 7} (1971), 281--292. In order to understand their
\inx{FFT, see Fast Fourier Transform}
approach to the problem, let us first take a look at the mechanism of \β{fast
Fourier transform}s.
Given a sequence of $K=2↑k$ complex numbers $(u↓0,\ldotss,u↓{K-1})$, and given
the complex
number$$\omega=\exp(2πi/K),$$ the sequence $(\A u↓0,\ldotss,\A u↓{K-1})$ defined in
(32) can be calculated rapidly by carrying out the following scheme.\xskip(In these
formulas the parameters $s↓j$ and $t↓j$ are either 0 or 1, so that each ``pass''
represents $2↑k$ computations.)
\yyskip\noindent Pass 0.\xskip Let $A↑{[0]}(t↓{k-1},
\ldotss , t↓0) = u↓t$, where $t = (t↓{k-1} \ldotsm t↓0)↓2$.
\yskip\noindent Pass 1.\xskip Set $A↑{[1]}(s↓{k-1},t↓{k-2}, \ldotss , t↓0)←$
\penalty1000\vskip2pt
\hbox to size{\hfill$ A↑{[0]}(0, t↓{k-2}, \ldotss , t↓0) +
\omega ↑{(s↓{k-1}0\ldotsm0)↓2} \cdot A↑{[0]}(1, t↓{k-2}, \ldotss
, t↓0)$.}
\yskip\noindent Pass 2.\xskip Set $A↑{[2]}(s↓{k-1},
s↓{k-2}, t↓{k-3}, \ldotss , t↓0)←$
\penalty1000\vskip2pt
\hbox to size{\hfill$ A↑{[1]}(s↓{k-1}, 0, t↓{k-3}, \ldotss
, t↓0) + \omega ↑{(s↓{k-2}s↓{k-1}0 \ldotsm 0)↓2} \cdot A↑{[1]}(s↓{k-1},
1, t↓{k-3}, \ldotss , t↓0)$.}
\yskip$\ldots$
\yskip\noindent Pass $k$.\xskip Set $A↑{[k]}(s↓{k-1},
\ldotss , s↓1, s↓0)←$
\penalty1000\vskip2pt
\hbox to size{\hfill$ A↑{[k-1]}(s↓{k-1}, \ldotss , s↓1, 0)
+ \omega ↑{(s↓0s↓1 \ldotsm s↓{k-1})↓2} \cdot A↑{[k-1]}(s↓{k-1},
\ldotss , s↓1, 1)$.}
\yyskip\noindent It is fairly easy to prove by induction that
we have
$$\twoline{A↑{[j]}(s↓{k-1}, \ldotss , s↓{k-j}, t↓{k-j-1}, \ldotss
, t↓0)}{7pt}{\null=\chop to 9pt{\sum ↓{0≤t↓{k-1}, \ldotss , t↓{k-j}≤1}\omega
↑{(s↓0s↓1 \ldotsm s↓{k-1})↓2 \cdot (t↓{k-1} \ldotsm t↓{k-j}0 \ldotsm
0)↓2}\, u↓t,\qquad(33)\hskip-1em}}$$
so that
$$A↑{[k]}(s↓{k-1}, \ldotss , s↓1, s↓0) = {\A u}↓s,\qquad
\hbox{where $s = (s↓0s↓1 \ldotsm s↓{k-1})↓2$.}\eqno(34)$$
(Note the reversed order of the binary digits in
$s$. For further discussion of transforms such as this, see
Section 4.6.4.)
\penalty-200 % good place for a break (March 7, 1979)
To get the \α{inverse Fourier transform} $(u↓0,\ldotss,u↓{K-1})$ from the values of
$(\A u↓0,\ldotss,\A u↓{K-1})$, we may note that the ``double transform'' is
$$\twoline{\hskip1pt \A{\A{\hskip-1pt u}}↓r=
\sum↓{0≤s<K}\omega↑{rs}\A u↓s=\sum↓{0≤s,t<K}\omega↑{rs}\omega
↑{st}u↓t}{-3pt}{=\sum↓{0≤t<K}u↓t\,\,\bigglp\sum↓{0≤s<K}\omega↑{s(t+r)}\biggrp
=Ku↓{\,(-r)\mod K},}$$ since the geometric series $\sum↓{0≤s<K}\omega↑{sj}$ sums to
zero unless $j$ is a multiple of↔$K$. Therefore the inverse transform can be
computed in the same way as the transform itself, except that the final results
must be divided by $K$ and shuffled slightly.
Applying this to the problem of integer multiplication, suppose we wish to compute
the product of two $n$-bit integers $u$ and $v$. As in Algorithm↔C we shall work
with groups of bits; let
$$2n≤2↑k\,l<4n,\qquad K=2↑k,\qquad L=2↑l,\eqno(35)$$
and write
$$u=(U↓{K-1}\ldotsm U↓1U↓0)↓L,\qquad v=(V↓{K-1}\ldotsm V↓1V↓0)↓L,\eqno(36)$$
regarding $u$ and $v$ as $K$-place numbers in radix $L$ so that each digit $U↓j$
or $V↓j$ is an $l$-bit integer. Actually the leading digits $U↓j$ and $V↓j$ are
zero for all $j≥K/2$, because $2↑{k-1}l≥n$. We will select appropriate values for
$k$ and $l$ later; at the moment our goal is to see what happens in general,
so that we can choose $k$ and↔$l$ intelligently when all the facts are before us.
The next step of the multiplication procedure is to compute the Fourier transforms
$(\A u↓0,\ldotss,\A u↓{K-1})$ and $(\A v↓0,\ldotss,\A v↓{K-1})$ of the
sequences $(u↓0,\ldotss,u↓{K-1})$ and $(v↓0,\ldotss,v↓{K-1})$, where
we define
$$u↓t=U↓t/2↑{k+l},\qquad v↓t=V↓t/2↑{k+l}.\eqno(37)$$
This scaling is done for convenience so that the absolute values $|u↓t|$ and
$|v↓t|$ are less than $2↑{-k}$, ensuring that $|\A u↓s|$ and $|\A v↓s|$ will be
less than 1 for all $s$.
An obvious problem arises here, since the complex number $\omega$ can't be
represented exactly in binary notation. How are we going to compute a reliable
Fourier transform? By a stroke of good luck, it turns out that everything
will work properly if we do the calculations with only a modest amount of
precision. For the moment let us bypass this question and assume that
infinite-precision calculations are being performed; we shall analyze later how
much accuracy is actually needed.
Once the $\A u↓s$ and $\A v↓s$ have been found, we let $\A w↓s=\A u↓s \A v↓s$
for $0≤s<K$ and determine the inverse Fourier transform $(w↓0,\ldotss,w↓{K-1})$.
As explained above, we now have
$$w↓r=\sum↓{i+j=r}u↓iv↓j = \sum↓{i+j=r}U↓iV↓j/2↑{2k+2l},$$
so the integers $W↓r=2↑{2k+2l}w↓r$ are the coefficients in the desired product
$$u\cdot v=W↓{K-2\,}L↑{K-2}+\cdots+W↓1L+W↓0.\eqno(38)$$
Since $0≤W↓r<(r+1)L↑2<KL↑2$, each $W↓r$ has at most $k+2l$ bits, so it will not be
difficult to compute the binary representation when the $W$'s are known unless
$k$ is large compared to $l$.
% Vicinity of page 292
% New copy replacing galley 15 (C) Addison-Wesley 1979
Let us try to estimate how much time this method takes, if $m$-bit \β{fixed point
arithmetic} is used in calculating the Fourier transforms. Exercise↔10 shows
that all of the quantities $A↑{[j]}$ during all the passes of the transform
calculations will be less than 1 in magnitude because of the scaling (37), hence
it suffices to deal with $m$-bit fractions $(.a↓{-1}\ldotsm a↓{-m})↓2$ for the
real and imaginary parts of all the intermediate quantities. Simplifications are
possible because the inputs $u↓t$ and↔$v↓t$ are real-valued; only $K$ real
values instead of $2K$ need to be carried in each step (see exercise 4.6.4--14).
We will ignore such refinements in order to keep complications to a minimum.
The first job is to compute $\omega$ and its powers. For simplicity we shall
make a table of the values $\omega↑0$, $\ldotss$, $\omega↑{K-1}$. Let$$\omega↓r=
\exp(2πi/2↑r),$$so that $\omega↓1=-1$, $\omega↓2=i$, $\omega↓3=(1+i)/\sqrt2$,
$\ldotss$, $\omega↓k=\omega$. If $\omega↓r=x↓r+iy↓r$, we have
$$\omega↓{r+1}=\sqrt{1+x↓r\over2}+i\sqrt{1-x↓r\over2}.\eqno(39)$$
The calculation of $\omega↓1$, $\omega↓2$, $\ldotss$, $\omega↓k$ takes negligible
time compared with the other computations we need, so we can use any straightforward
algorithm for square roots. Once the $\omega↓r$ have been calculated we can
compute all of the powers $\omega↑j$ by starting with $\omega↑0=1$ and using the
following idea for $j>0$: If $j=2↑{K-r}\cdot q$ where $q$ is odd, and if $j↓0=
2↑{K-r}\cdot(q-1)$, we have
$$\omega↑j=\omega↑{j↓0}\cdot\omega↓r.\eqno(40)$$
This method of calculation keeps errors from propagating, since each $\omega↑j$
is a product of at most $k$ of the $\omega↓r$'s. The total time to calculate all
the $\omega↑j$ is $O(KM)$, where $M$ is the time to do an $m$-bit complex
multiplication; this is less time than the subsequent steps will require, so we
can ignore it.
Each of the three Fourier transformations comprises $k$ passes, each of which
involves $K$ operations of the form $a←b+\omega↑j c$, so the total time to calculate
the Fourier transforms is $$O(kKM)=O(Mnk/l).$$Finally, the work involved in
computing the binary digits of $u\cdot v$ using (38) is $O\biglp K(k+l)\bigrp
=O(n+nk/l)$. Summing over all operations, we find that the total time to multiply
$n$-bit numbers $u$ and $v$ will be $O(n)+O(Mnk/l)$.
Now let's see how large the intermediate precision $m$ needs to be, so that we
know how large $M$ needs to be. For simplicity we shall be content with safe
estimates of the accuracy, instead of finding the best possible bounds. It
\inx{error estimates}
\inx{absolute error}
will be convenient to compute all the $\omega↑j$ so that our approximations
$(\omega↑j)↑\prime$ will satisfy $|(\omega↑j)↑\prime|≤1$; this condition is easy
to guarantee if we \α{truncate} towards zero instead of rounding.
The operations we need to perform with $m$-bit fixed point complex arithmetic
are all obtained by replacing an exact computation of the form $a←b+\omega↑jc$
by the approximate computation
$$a↑\prime←\hbox{truncate}\biglp b↑\prime+(\omega↑j)↑\prime c↑\prime\bigrp,
\eqno(41)$$
where $b↑\prime$, $(\omega↑j)↑\prime$, and $c↑\prime$ are previously computed
approximations; all of these complex numbers and their approximations are bounded
by 1 in absolute value. If $|b↑\prime-b|≤\delta↓1$, $|(\omega↑j)↑\prime-
\omega↑j|≤\delta↓2$, and $|c↑\prime-c|≤\delta↓3$, it is not difficult to
see that we will have $|a↑\prime-a|<\delta+\delta↓1+\delta↓2+\delta↓3,$ where
$$\delta=|2↑{-m}+2↑{-m}\,i|=2↑{1/2-m},$$
because we have $\left|(\omega↑j)↑\prime c↑\prime-\omega↑jc\right|=
\left|\biglp(\omega↑j)↑\prime-\omega↑j\bigrp c↑\prime+\omega↑j(c↑\prime-c)\right|
≤\delta↓2+\delta↓3$, and
$\delta$ is the maximum truncation error. The approximations $(\omega↑j)↑\prime$
are obtained by starting with approximate values $\omega↓{\!r}↑\prime$ to the
numbers defined in (39), and we may assume that $|\omega↓{\!r}↑\prime-\omega↓r|
≤\delta$. Each multiplication (40) has the form of (41) with $b↑\prime=0$,
so an additional error of at most $2\delta$ is made per multiplication, and we
have $\left|(\omega↑j)↑\prime-\omega↑j\right|≤(2k-1)\delta$ for all $j$.
If we have errors of at most $ε$ before any pass of the fast Fourier transform,
the operations of that pass therefore have the form (41) where $\delta↓1=
\delta↓3=ε$ and $\delta↓2=(2k-1)\delta$, and the errors after the pass will be
at most $2ε+2k\delta$. There is no error in ``Pass 0,'' so we find by
induction on $j$ that the maximum error
after ``Pass $j$'' is bounded by $(2↑j-1)\cdot2k\delta$, and the computed
values of $\A u↓s$ will satisfy $\left|(\A u↓s)↑\prime-\A u↓s\right|<(2↑k-1)\cdot
2k\delta$. A similar
formula will hold for $(\A v↓s)↑\prime$; and we will have $$|(\A w↓s)↑
\prime-\A w↓s|<2(2↑k-1)\cdot2k\delta+\delta.$$ During the inverse transformation
there is an additional accumulation of errors, but the division by $K=2↑k$
ameliorates most of this; by the same argument we find that
the computed values $w↓{\!r}↑\prime$ will satisfy
$$|w↓{\!r}↑\prime-w↓r|<4k2↑k\delta.$$We need enough precision to make
$2↑{2k+2l\,}w↓{\!r}↑\prime$ round to the correct integer $W↓r$, hence we need
$$2↑{2k+2l+2+\lg k+k+1/2-m}≤{1\over2},$$ i.e.,
$m≥3k+2l+\lg k+7/2$. This will hold if we simply require that
$$k≥7\qquad\hbox{and}\qquad m≥4k+2l.\eqno(42)$$
Relations (35) and (42) can be used to determine parameters $k$, $l$, $m$ so
that multiplication takes $O(n)+O(Mnk/l)$ units of time, where $M$ is the
time to multiply $m$-bit fractions.
If we are using \MIX, for example, suppose we want to multiply binary num\-bers
having $n=2↑{13}=8192$ bits each. We can choose $k=11$, $l=8$, $m=60$, so that
the necessary $m$-bit operations are nothing more than double precision arithmetic.
The running time $M$ needed to do fixed point $m$-bit complex multi\-plication
will therefore be comparatively small. With triple-precision operations we
can go up for example
to $k=l=15$, $n≤15\cdot2↑{14}$, which takes us way beyond
\MIX's memory capacity.
Further study of the choice of $k$, $l$, and $m$ leads in fact to a rather
surprising conclusion:\xskip{\sl For all practical purposes we can assume that $M$ is
constant, and the Sch\"onhage--Strassen multiplication technique will have a
running time linearly proportional to $n$.}\xskip The reason is that we can choose
$k=l$ and $m=6k$; this choice of $k$ is always less than $\lg n$, so we will never
need to use more than sextuple precision unless $n$ is larger than the word size
of our computer.\xskip(In particular, $n$ would have to be larger than the
capacity of an index register, so we probably couldn't
fit the numbers $u$ and $v$ in main memory.)
The practical problem of fast multiplication is therefore solved, except for
improvements in the constant factor.
In fact, the all-integer convolution algorithm of exercise 4.6.4--59 is
probably a better choice for practical high-precision multiplication, even though
it has a slightly worse asymptotic behavior.
Our interest in multiplying large
numbers is partly theoretical, however, because it is interesting to explore the
ultimate limits of computational complexity. So let's forget practical
considerations and suppose that $n$ is extremely huge, perhaps much larger than
the number of atoms in the universe. We can let $m$ be approximately $6\lg n$, and
use the same algorithm recursively to do the $m$-bit multiplications. The
running time will satisfy $T(n)=O\biglp nT(\log n)\bigrp$; hence
$$T(n) ≤ C\,n(C\lg n)(C\lg\lg n)(C\lg\lg\lg n)\ldotsm,$$
where the product continues until reaching a factor with $\lg\ldots\lg n≤1$.
Sch\"onhage and Strassen showed how to improve this theoretical upper bound to
$O(n\log n\log\log n)$ in their paper, by using {\sl integer} numbers
$\omega$ to carry out fast Fourier transforms on integers, modulo numbers of the
form $2↑e+1$. This upper bound applies to Turing machines, i.e., to computers with
bounded memory and a finite number of arbitrarily long tapes.
If we allow ourselves a more powerful computer, with random access to any number of
words of bounded size, Sch\"onhage has pointed out that the upper bound drops to
$O(n\log n)$. For we can choose $k=l$ and $m=6k$, and we have time to build a
complete multiplication table of all possible products $xy$ for
$0≤x,y<2↑{\lceil m/12\rceil}$.\xskip(The number of such products is $2↑k$ or
$2↑{k+1}$, and we can compute each table entry by addition from one of its
predecessors in $O(k)$ steps, hence $O(k2↑k)=O(n)$ steps will suffice for the
calculation.)\xskip In this case $M$ is the time needed to do 12-place arithmetic in
radix $2↑{\lceil m/12\rceil}$, and it follows that $M=O(k)=O(\log n)$ because
\hbox{1-place} multiplication can be done by table lookup.
Sch\"onhage discovered in 1979 that a {\sl \α{pointer machine}} can carry out $n$-bit
multiplication in $O(n)$ steps; see exercise↔12. Such devices (which are
also called ``\α{storage modification machines}'' and ``\α{linking automata}'') seem to
provide the best models of computation when $n→∞$, as discussed at the end of
Section 2.6. So we can conclude that multiplication in $O(n)$ steps is possible
for theoretical purposes as well as in practice.
% Vicinity of page 295
% folio 392 galley 16 ,1979 *
\subsectionbegin{D. Division} Now that we have efficient routines for
multiplication, let's consider the inverse problem. It turns out that \β{division}
can be performed just as fast as multiplication, except for a constant factor.
To divide an $n$-bit number $u$ by an $n$-bit
number $v$, we may first find an $n$-bit approximation to $1/v$,
then multiply by $u$ to get an approximation $\A q$ to
$u/v$; finally, we can make the slight correction necessary
to $\A q$ to ensure that $0 ≤ u - qv < v$ by using another
multiplication. From this reasoning, we see that it suffices
to have an efficient algorithm for approximating the reciprocal of an
$n$-bit number. The following algorithm does this, using
``\α{Newton's method}'' as explained at the end of Section 4.3.1.
\penalty-300 % break here if close (April 1980)
\algbegin Algorithm R (High-precision \α{reciprocal}).
Let $v$ have the binary representation $v = (0.v↓1v↓2v↓3\ldotsm
)↓2$, where $v↓1 = 1$. This algorithm computes an approximation
$z$ to $1/v$, such that
$$|z - 1/v| ≤ 2↑{-n}.\eqno(43)$$
\algstep R1. [Initial approximation.] Set
$z ← {1\over 4}\lfloor 32/(4v↓1 + 2v↓2 + v↓3)\rfloor$ and $k ← 0$.
\algstep R2. [Newtonian iteration.] (At this point we have a
number $z$ of the binary form $(xx.xx \ldotsm x)↓2$ with $2↑k
+ 1$ places after the radix point, and $z ≤ 2$.)\xskip Calculate $z↑2
= (xxx.xx \ldotsm x)↓2$ exactly, using a high-speed multiplication
routine. Then calculate $V↓{k\,}z↑2$ exactly, where $V↓k = (0.v↓1v↓2
\ldotsm v↓{2↑{k+1}+3})↓2$. Then set $z
← 2z - V↓{k\,}z↑2 + r$, where $0 ≤ r < 2↑{-2↑{k+1}-1}$
is added if necessary to ``round up'' $z$ so that it is a
multiple of $2↑{-2↑{k+1}-1}$. Finally,
set $k ← k + 1$.
\algstep R3. [Test for end.] If $2↑k < n$, go back to step R2;
otherwise the algorithm terminates.\quad\blackslug
\yyskip This algorithm is based on
a suggestion by S. A. \α{Cook}. A similar
technique has been used in computer hardware [see \α{Anderson},
\α{Earle,} \α{Goldschmidt,} and \α{Powers,} {\sl IBM J. Res.\ Dev.\ \bf 11}
(1967), 48--52]. Of course, it is necessary to check the accuracy
of Algorithm↔R quite carefully, because it comes very close
to being inaccurate. We will prove by induction that
$$z ≤ 2\qquad\hbox{and}\qquad |z - 1/v| ≤ 2↑{-2↑k}\eqno(44)$$
at the beginning and end of step R2.
For this purpose, let $\delta ↓k = 1/v - z↓k$, where
$z↓k$ is the value of $z$ after
$k$ iterations of step R2. To start the induction on $k$, we
have$$\delta ↓0 = 1/v - 8/v↑\prime + {(32/v↑\prime - \lfloor
32/v↑\prime \rfloor )/4} = \eta ↓1 + \eta ↓2,$$
where $v↑\prime= (v↓1v↓2v↓3)↓2$ and $\eta ↓1 = (v↑\prime - 8v)/vv↑\prime$,
so that we have
$-{1\over 2} < \eta ↓1 ≤ 0$ and $0 ≤ \eta ↓2 < {1\over 4}$.
Hence $|\delta ↓0| < {1\over 2}$. Now suppose that (44) has been
verified for↔$k$; then
$$\baselineskip14pt
\eqalign{\delta↓{k+1} = 1/v - z↓{k+1} ⊗= 1/v - z↓k - z↓k(1- z↓kV↓k) - r\cr
⊗ = \delta ↓k - z↓k(1 - z↓kv) - z↑{2}↓{k}(v - V↓k) - r\cr
⊗ = \delta ↓k - (1/v - \delta ↓k)v\delta ↓k - z↑{2}↓{k}(v - V↓k) - r\cr
⊗ = v\delta ↑{2}↓{k} - z↑{2}↓{k}(v - V↓k) - r.\cr}$$
Now
$$0 ≤ v\delta ↑{2}↓{k} < \delta ↑{2}↓{k} ≤ (2↑{-2↑k})↑2
= 2↑{-2↑{k+1}},$$
and
$$0 ≤ z↑2(v - V↓k) + r < 4(2↑{-2↑{k+1}-3})+2↑{-2↑{k+1}-1}=2↑{-2↑{k+1}},$$
so $|\delta ↓{k+1}| ≤ 2↑{-2↑{k+1}}$.
We must still verify the first inequality of (44); to show
that $z↓{k+1} ≤ 2$, there are three cases:\xskip (a) $V↓k = {1\over
2}$; then $z↓{k+1} = 2$.\xskip (b) $V↓k ≠ {1\over 2} = V↓{k-1}$; then
$z↓k = 2$, so $2z↓k - z↑{2}↓{k}V↓k ≤ 2 - 2↑{-2↑{k+1}-1}$.\xskip
(c) $V↓{k-1} ≠ {1\over 2}$; then $z↓{k+1}
= 1/v - \delta ↓{k+1} < 2 - 2↑{-2↑{k+1}}≤2$, since $k>0$.
The running time of Algorithm↔R is bounded by
$$\textstyle 2T(4n) + 2T(2n) + 2T(n) + 2T({1\over 2}n)+\cdots+O(n)$$
steps, where $T(n)$ is an upper bound on the time needed to do a multiplication
of $n$-bit numbers. If $T(n)$ has the form $nf(n)$ for some monotonically
nondecreasing function
$f(n)$, we have$$T(4n) + T(2n) + T(n) + \cdots < T(8n),$$so division can
be done with a speed comparable to that of multiplication except
for a constant factor.
R. P. \α{Brent} has shown that functions such as $\log x$, $\exp x$,
and $\mathop{\hbox{arctan}}x$
can be evaluated to $n$ significant bits in $O\biglp M(n)\log n\bigrp$
steps, if it takes $M(n)$ units of time to multiply $n$-bit numbers [{\sl
JACM \bf23} (1976), 242--251].
\subsectionbegin{E. An even faster multiplication method}
It is natural to wonder if multiplication of $n$-bit numbers
can be accomplished in just $n$ steps. We have come from order
$n↑2$ down to order $n$, so perhaps we can squeeze the
time down to the absolute minimum. In fact,
it is actually possible to output the answer as fast as we input the
digits, if we leave
the domain of conventional computer programming and allow ourselves
to build a computer that has an unlimited number of components
all acting at once.
A {\sl \β{linear iterative array}} of \β{automata} is a set of devices
$M↓1$, $M↓2$, $M↓3$, $\ldots$ that can each be in a finite set of
``states'' at each step of a computation. The machines $M↓2$,
$M↓3$, $\ldots$ all have {\sl identical} circuitry, and their state
\inxf{hardware, suitable algorithms for}
at time $t + 1$ is a function of their own state at time $t$
as well as the states of their left and right neighbors at time
$t$. The first machine $M↓1$ is slightly different: its state
at time $t + 1$ is a function of its own state and that of $M↓2$,
at time $t$, and also of the {\sl input} at time $t$. The {\sl
output} of a linear iterative array is a function defined on
the states of $M↓1$.
Let $u = (u↓{n-1} \ldotsm u↓1u↓0)↓2$, $v =
(v↓{n-1} \ldotsm v↓1v↓0)↓2$, and $q = (q↓{n-1} \ldotsm q↓1q↓0)↓2$
be binary numbers, and let $uv + q = w = (w↓{2n-1} \ldotsm w↓1w↓0)↓2$.
It is a remarkable fact that a linear iterative array can be constructed,
independent of $n$, that will output $w↓0$, $w↓1$, $w↓2$, $\ldots$
at times 1, 2, 3, $\ldotss$, if it is given the inputs $(u↓0,
v↓0, q↓0)$, $(u↓1, v↓1, q↓1)$, $(u↓2, v↓2, q↓2)$, $\ldots$ at times
0, 1, 2, $\ldotss\,$.
We can state this phenomenon in the language of computer hardware,
by saying that it is possible to design a single ``\α{integrated
circuit module}'' with the following property: If we wire together
sufficiently many of these devices in a straight line, with
each module communicating only with its left and right neighbors,
the resulting circuitry will produce the $2n$-bit product of
$n$-bit numbers in exactly $2n$ clock pulses.
\topinsert{\vbox to 520pt{\hbox{(Table 1 will go on this page,
it's being set separately)}}}
Here is the basic idea behind this construction: At time↔0,
machine $M↓1$ senses $(u↓0, v↓0, q↓0)$ and it therefore is able to output
$(u↓0v↓0 + q↓0)\mod 2$ at time↔1. Then it sees $(u↓1, v↓1, q↓1)$
and it can output $(u↓0v↓1 + u↓1v↓0 + q↓1 + k↓1)\mod 2$, where
$k↓1$ is the ``carry'' left over from the previous step, at
time↔2. Next it sees $(u↓2, v↓2, q↓2)$ and outputs $(u↓0v↓2
+ u↓1v↓1 + u↓2v↓0 + q↓2 + k↓2)\mod 2$; furthermore, its state
holds the values of $u↓2$ and $v↓2$ so that machine $M↓2$
will be able to sense these values at time↔3, and $M↓2$ will
be able to compute $u↓2v↓2$ for the benefit of $M↓1$ at time↔4.
Machine $M↓1$ essentially arranges to start $M↓2$ multiplying the sequence
$(u↓2, v↓2)$, $(u↓3, v↓3)$, $\ldotss $, and $M↓2$ will ultimately
give $M↓3$ the job of multiplying $(u↓4, v↓4)$, $(u↓5, v↓5)$,
etc. Fortunately, things just work out so that no time is lost. The
reader will find it interesting to deduce further details from the formal
description that follows.
% Vicinity of page 297
% folio 394 galley 17 ,1979 *
Each automaton has $2↑{11}$
states$$(c, x↓0, y↓0, x↓1, y↓1, x, y, z↓2, z↓1, z↓0),$$where
$0 ≤ c < 4$ and each of the $x$'s, $y$'s, and $z$'s is either
0 or 1. Initially, all devices are in state $(0, 0, 0, 0, 0,
0, 0, 0, 0, 0)$. Suppose that a machine $M↓j$, for $j > 1$, is in state
$(c, x↓0, y↓0, x↓1, y↓1, x, y, z↓2, z↓1, z↓0)$ at time $t$,
and its left neighbor $M↓{j-1}$ is in state
$(c↑l,x↓0↑l,y↓0↑l,x↓1↑l,y↓1↑l,x↑l,y↑l,z↓2↑l,z↓1↑l,z↓0↑l)$
while its right neighbor $M↓{j+1}$ is in state
$(c↑r,x↓0↑r,y↓0↑r,x↓1↑r,y↓1↑r,x↑r,y↑r,z↓2↑r,z↓1↑r,z↓0↑r)$ at that time.
Then machine $M↓j$ will go into state
$(c↑\prime,x↓0↑\prime,y↓0↑\prime,x↓1↑\prime,y↓1↑\prime,x↑\prime,y↑\prime,
z↓2↑\prime,z↓1↑\prime,z↓0↑\prime)$ at time $t+1$, where
$$\baselineskip14pt
\vcenter{\halign{$\hfill#$⊗$\null#\hfill$\qquad⊗if
$\hfill#$⊗$\null#,\qquad$⊗$#\hfill$\ \ otherwise;\cr
c↑\prime=\min⊗(c+1,3)⊗c↑l⊗=3⊗0\cr
(x↓0↑\prime,y↓0↑\prime)⊗=(x↑l,y↑l)⊗c⊗=0⊗(x↓0,y↓0)\cr
(x↓1↑\prime,y↓1↑\prime)⊗=(x↑l,y↑l)⊗c⊗=1⊗(x↓1,y↓1)\cr
(x↑\prime,y↑\prime)⊗=(x↑l,y↑l)⊗c⊗≥2⊗(x,y)\cr}}\eqno(45)$$
and $(z↑{\prime}↓{2}z↑{\prime}↓{1}z↑{\prime}↓{0})↓2$
is the binary notation for
$$z↓0↑r+z↓1+z↓2↑l+\left\{\,\vcenter{\baselineskip14pt
\halign{$\dispstyle#$,\hfill\quad⊗if $c=#$\hfill\cr
x↑ly↑l⊗0;\cr
x↓0y↑l+x↑ly↓0⊗1;\cr
x↓0y↑l+x↓1y↓1+x↑ly↓0⊗2;\cr
x↓0y↑l+x↓1y+xy↓1+x↑ly↓0⊗3.\cr}}\right.\eqno(46)$$
The leftmost machine $M↓1$ behaves in almost
the same way as the others; it acts exactly as if there were
a machine to its left in state $(3, 0, 0, 0, 0, u, v, q, 0, 0)$
when it is receiving the inputs $(u, v, q)$. The output of the
array is the $z↓0$ component of $M↓1$.
Table 1 shows an example of this array acting
on the inputs $$u = v = (\ldotsm 00010111)↓2,\qquad q = (\ldotsm 00001011)↓2.$$
The output sequence appears in the lower right portion of the
states of $M↓1$:$$\hbox{0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, $\ldotss $,}$$ representing
the number $(\ldotsm 01000011100)↓2$ from right to left.
This construction is based on a similar one first published
by A. J. \α{Atrubin,} {\sl IEEE Trans. \bf EC--14} (1965),
394--399.
S. \α{Winograd} [{\sl JACM \bf 14} (1967), 793--802] has
investigated the minimum multiplication time achievable in a
logical circuit when $n$ is given and when the inputs are available
all at once in coded form. See also C. S. \α{Wallace,} {\sl IEEE
Trans.\ \bf EC--13} (1964), 14--17; A. C. \α{Yao}, to appear.
\exbegin{EXERCISES}
\exno 1. [22] The idea
expressed in (2) can be generalized to the decimal system, if
the radix 2 is replaced by 10. Using this generalization, calculate
2718 times 4742 (reducing this product of four-digit numbers
to three products of two-digit numbers, and reducing each of
the latter to products of one-digit numbers).
\exno 2. [M22] Prove that, in step C1 of Algorithm↔C\null, the value
of $R$ either stays the same or increases by one when we set
$R ← \lfloor \sqrt{Q}\rfloor $.\xskip (Therefore,
as observed in that step, we need not calculate a square root.)
\exno 3. [M23] Prove that the sequences $q↓k$, $r↓k$ defined in
Algorithm↔C satisfy the inequality $2↑{q↓k+1}(2r↓k)↑{r↓k}
≤ 2↑{q↓{k-1}+q↓k}$, when $k > 0$.
\trexno 4. [28] (K. \α{Baker}.)\xskip Show
that it is advantageous to evaluate the polynomial $W(x)$ at
the points $x = -r$, $\ldotss$, 0, $\ldotss$, $r$ instead of at the nonnegative
points $x = 0$, 1, $\ldotss$,↔$2r$ as in Algorithm↔C\null. The polynomial
$U(x)$ can be written $$U(x) = U↓e(x↑2) + xU↓o(x↑2),$$ and similarly
$V(x)$ and $W(x)$ can be expanded in this way; show how to exploit
this idea, obtaining faster calculations in steps C7 and C8.
\trexno 5. [35] Show that
if in step C1 of Algorithm↔C we set $R ← \lceil
\sqrt{2Q}\,\rceil + 1$ instead of $R ← \lfloor
\sqrt{Q}\rfloor $, with suitable initial values of $q↓0$, $q↓1$,
$r↓0$, and $r↓1$, then (19) can be improved to $t↓k ≤ q↓{k+1}2↑
{\sqrt{2\lg q↓{k+1}}}\,(\lg q↓{k+1})$.
\exno 6. [M23] Prove that the six
numbers in (22) are relatively prime in pairs.
\exno 7. [M23] Prove (23).
\trexno 8. [25] Prove that it takes only $O(K\log K)$ arithmetic operations to
evaluate the discrete Fourier transform (32), even when $K$ is not a power of 2.\xskip
\inx{fast Fourier transform}
[{\sl Hint:} Rewrite (32) in the form
$$\A u↓s=\omega↑{-s↑2/2}\sum↓{0≤t<K}\omega↑{(s+t)↑2/2}\omega↑{-t↑2/2}u↓t$$
and express this sum as a \α{convolution} product.]
\exno 9. [M15] Suppose the Fourier transformation method of the text is applied
with all occurrences of $\omega$ replaced by $\omega↑q$, where $q$ is some fixed
integer. Find a simple relation
between the numbers $(\s u↓0,\s u↓1,\ldotss,
\s u↓{K-1})$ obtained by this general procedure and the numbers $(\A u↓0,\A u↓1,
\ldotss,\A u↓{K-1})$ obtained when $q=1$.
\exno 10. [M26] The scaling in (37) makes it clear that all the complex numbers
$A↑{[j]}$
computed by pass $j$ of the transformation subroutine will be less than
$2↑{j-k}$ in absolute value, during the calculations of $\A u↓s$ and $\A v↓s$
in the \β{Sch\"onhage}--\α{Strassen} multiplication algorithm. Show that all of the
$A↑{[j]}$ will be less than 1 in absolute value during the {\sl third}
Fourier transformation (the calculation of $w↓r$).
\trexno 11. [M26] If $n$ is fixed, how many of the automata in
the linear iterative array (45), (46) are needed to compute
the product of $n$-bit numbers?\xskip (Note that the automaton $M↓j$
is influenced only by the component $z↑{r}↓{0}$ of the machine
on its right, so we may remove all automata whose $z↓0$ component
is always zero whenever the inputs are $n$-bit numbers.)
\trexno 12. [30] (A. Sch\"onhage.)\xskip The purpose of this exercise is to prove
that a simple form of \α{pointer machine} can multiply $n$-bit numbers in $O(n)$ steps.
The machine has no built-in facilities for arithmetic; all it does is work with
nodes and pointers. Each node has the same finite number of link fields,
and there are finitely many link registers. The only operations this machine can do
are:
\def\\(#1){\noindent\hbox to 30pt{\hfill #1) }\hangindent 30pt\!}
\yskip
\\(i) read one bit of input and jump if that bit is 0;
\\(ii) output 0 or 1;
\\(iii) load a register with the contents of another register or with
the contents of a link field in a node pointed to by a register;
\\(iv) store the contents of a register into a link field in a node pointed to by a
register;
\\(v) jump if two registers are equal;
\\(vi) create a new node and make a register point to it;
\\(vii) halt.
\yskip\noindent
Implement the Fourier-transform multiplication method efficiently on such a
ma\-chine.\xskip[{\sl Hints:} First show that if $N$ is any positive integer, it
is possible to create $N$ nodes representing the integers $\{0,1,\ldotss,N-1\}$,
where the node representing $p$ has pointers to the nodes representing $p+1$,
$\lfloor p/2\rfloor$, and $2p$. These nodes can be created in $O(N)$ steps.
Show that arithmetic with radix $N$ can now be simulated without difficulty:
for example, it takes $O(\log N)$ steps to find the node for $(p+q)\mod N$ and
to determine if $p+q≥N$, given pointers to $p$ and $q$; and multiplication
can be simulated in $O(\log N)↑2$ steps. Now consider the algorithm in the text,
with $k=l$ and $m=6k$ and $N=2↑{\lceil m/13\rceil}$, so that all quantities
in the fixed point arithmetic calculations are 13-place integers with radix↔$N$.
Finally, show that each pass of the fast Fourier transformations can be done in
$O\biglp K+(N\log N)↑2\bigrp=O(K)$ steps, using the following idea: Each of
the $K$ necessary assignments can be ``compiled'' into a bounded list of
instructions for a simulated \MIX-like computer whose word size is $N$, and
instructions for $K$ such machines acting in parallel can be simulated in
$O\biglp K+(N\log N)↑2\bigrp$ steps if they are first sorted so that all
identical instructions are performed together.\xskip(Two instructions are identical
if they have the same operation code, the same register contents, and the
same memory operand contents.)\xskip
Note that $N↑2=O(n↑{12/13})$, so $(N\log N)↑2=O(K)$.]
\exno 13. [M25] (A. Sch\"onhage.)\xskip What is a good upper
bound on the time needed to multiply an $m$-bit number by an
$n$-bit number, when both $m$ and $n$ are very large but $n$
is much larger than $m$, based on the results proved in this
section for $m = n$?
\exno 14. [M42] Write a program for Algorithm↔C\null, incorporating
the improvements of exercise↔4. Compare it with a program for
Algorithm 4.3.1M and with a program based on (2), to see how
large $n$ must be before Algorithm↔C is an improvement.
\exno 15. [M49] (S. A. \α{Cook}.)\xskip A multiplication algorithm is said to be
{\sl on line} if the $(k+1)$st input bits of the operands, from right to left,
are not read until the $k$th output bit has been produced. What are the fastest
possible on-line multiplication algorithms achievable on various species of
automata?
$\biglp$The best upper bound known is
$O\biglp n(\log n)↑2\log\log n\bigrp$, due to M. J. \α{Fischer} and L. J.
\α{Stockmeyer} [{\sl J. Comp.\ and Syst.\ Sci.\ \bf9} (1974), 317--331]; their
construction works on multitape Turing machines, hence also on pointer machines.
The best lower bound known is of order
$n\log n/\!\log\log n$, due to M. S. \α{Paterson},
M. J. \α{Fischer,} and A. R. \α{Meyer} [{\sl SIAM/AMS Proceedings \bf7} (1974), 97--111]; this applies
to multitape Turing machines but not to \α{pointer machines}.$\bigrp$
\vfill\end